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NUCLEIC ACID ANALYSIS TECHNIQUES 
5 CROSS REFERENCE TO RELATED APPLICATIONS 

71 - is a continuation-in-part of U.S.S.N. 60/010,471 filed on January 23, 
1996 and a continuation-in-part of provisional patent application for "Labeling of Nucleic 
Acids" naming Lockhart, Crqnin, Lee, Tran, Matsuzaki, McGall and Barone as inventors, 
filed on January 9, 1997, both-of which are herein incorporated by reference for all 
10 purposes. 

BACKGROUND OF THE INVENTION 

A portion of the disclosure of this patent document contains material which 
is subject to copyright protection. The copyright owner has no objection to the xerographic 
15 reproduction by anyone of the patent document or the patent disclosure in exactly the form 
it appears in the Patent and Trademark Office patent file or records, but otherwise reserves 

all copyright rights whatsoever. 

Many disease states are characterized by differences in the expression levels 
of various genes either through changes in the copy number of the genetic DNA or through 
20 changes in levels of transcription (e.g. through control of initiation, provision of RNA 
precursors, RNA processing, etc.) of particular genes. For example, losses and gains of 
genetic material play an important role in malignant transformation and progression. 
These gains and losses are thought to be "driven" by at least two kinds of genes. 
Oncogenes are positive regulators of tumorigenesis, while tumor suppressor genes are 
25 negative regulators of tumorigenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, 

Science, 254: 1 138-1 146 (1991)). Therefore, one mechanism of activating unregulated 
growth is to increase the number of genes coding for oncogene proteins or to increase the 
level of expression of these oncogenes (e.g. in response to cellular or environmental 
changes), and another is to lose genetic material or to decrease the level of expression of 
30 genes that code for tumor suppressors. This model is supported by the losses and gams of 
genetic material associated with glioma progression (Mikkelson et al. J. Cell. Biochem. 
46: 3-8 (1991)). Thus, changes in the expression (transcription) levels of particular genes 



(e.g. oncogenes or tumor suppressors), serve as signposts for the presence and progression 
of various cancers. 

Similarly, control of the cell cycle and cell development, as well as diseases, 
are characterized by the variations in the transcription levels of particular genes. Thus, for 
example, a viral infection is often characterized by the elevated expression of genes of the 
particular virus. For example, outbreaks of Herpes simplex, Epstein-Barr virus infections 
ie g. infectious mononucleosis), cytomegalovirus, Varicella-zoster virus infections, 
parvovirus infections, human papillomavirus infections, etc. are all characterized by 
elevated expression of various genes present in the respective virus. Detection of elevated 
expression levels of characteristic viral genes provides an effective diagnostic of the 
disease state. In particular, viruses such as herpes simplex, enter quiescent states for 
periods of time only to erupt in brief periods of rapid replication. Detection of expression 
levels of characteristic viral genes allows detection of such active proliferative (and 

presumably infective) states. 
y 5 The of "traditional" hybridization protocols for monitoring or 

' 4 quantifying gene expression is problematic. For example two or more gene products of 
approximately the same molecular weight will prove difficult or impossible to distinguish 
in a Northern blot because they are not readily separated by electrophoretic methods. 
Similarly, as hybridization efficiency and cross-reactivity varies with the particular 
subsequence (region) of a gene being probed it is difficult to obtain an accurate and reliable 
measure of gene expression with one, or even a few, probes to the target gene. 

The development of VLSIPS™ technology provided methods for 
synthesizing arrays of many different oligonucleotide probes that occupy a very small 
surface area. See U.S. Patent No. 5,143,854 and PCT patent publication No. WO 
90/15070. U.S. Patent application Serial No. 082,937, filed June 25, 1993, describes 
methods for making arrays of oligonucleotide probes that can be used to provide the 
complete sequence of a target nucleic acid and to detect the presence of a nucleic acid 
containing a specific nucleotide sequence. 

Previous methods of measuring nucleic acid abundance differences or 
30 changes in the expression of various genes (e.g., differential diaplay, SAGE, cDNA 

sequencing, clone spotting, etc.) require assumptions about, or prior knowledge regarding 
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the target sequences in order to design appropriate sequence-specific probes. Other 
methods, such as subtractive hybridization, do not require prior sequence knowledge, but 
also do not directly provide sequence information regarding differentially expressed 
nucleic acids. 

5 

Ki liwinarv o f fhy invention 

The present invention, in one embodiment, provides methods of monitoring 
the expression of a multiplicity of preselected genes (referred to herein as "expression 
monitoring"). In another embodiment this invention provides a way of identifying 
10 differences in the compositions of two or more nucleic acid (e.g., RNA or DNA) samples, 
Where the nucleic acid abundances reflect expression levels in biological samples from 
which the samples are derived, the invention provides a method for identifying differences 
expression profiles bewteen two or more samples. These "generic difference screening 
methods" are rapid, simple to apply, require no a priori assumptions regarding the 
[| 5 particular sequences whose expression may differ between the two samples, and provide 
£ direct sequence information regarding the nucleic acids whose abundances differ between 

i" r ! the samples. 

In one embodiment, this invention provides a method of identifying 
differences in nucleic acid levels between two or more nucleic acid samples. The method 
20 involves the steps of: (a) providing one or more oligonucleotide arrays said arrays 

comprising probe oligonucleotides attached to a surface; (b) hybridizing said nucleic acid 
samples to said one or more arrays to form hybrid duplexes between nucleic acids in said 
nucleic acid samples and probe oligonucleotides in said one or more arrays that are 
complementary to said nucleic acids or subsequences thereof;(c) contacting said one or 
25 more arrays with a nucleic acid ligase; and (d) determining differences in hybridization 
between said nucleic acid samples wherein said differences in hybridization indicate 
differences in said nucleic acid levels. 

In another embodiment, the method of identifying differences in nucleic 
acid levels between two or more nucleic acid samples involves the steps of: (a) providing 
30 one or more oligonucleotide arrays comprising probe oligonucleotides wherein said probe 
oligonucleotides comprise a constant region and a variable region; (b) hybridizing said 
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nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic 
acids in said nucleic acid samples and said variable regions that are complementary to said 
nucleic acids or subsequences thereof; and (c) determining differences in hybridization 
between said nucleic acid samples wherein said differences in hybridization indicate 
5 differences in said nucleic acid levels. 

In yet another embodiment, the method of identifying differences in nucleic 
acid levels between two or more nucleic acid samples involves the steps of: (a) providing 
one or more high density oligonucleotide arrays; (b) hybridizing said nucleic acid samples 
to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic 

10 acid samples and probe oligonucleotides in said one or more arrays that are complementary 
to said nucleic acids or subsequences thereof; and (c) determining the differences in 

!S hybridization between said nucleic acid samples wherein said differences in hybridization 

01 indicate differences in said nucleic acid levels. 

1 In still yet another embodiment, the method of identifying differences in 

!}| nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) 
providing one or more oligonucleotide arrays each comprising probe oligonucleotides 
m wherein said probe oligonucleotides are not chosen to hybridize to nucleic acids derived 
ifj from particular preselected genes or mRNAs; (b) hybridizing said nucleic acid samples to 
P said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid 
20 samples and probe oligonucleotides in said one or more arrays that are complementary to 
said nucleic acids or subsequences thereof; and (d) determining differences in 
hybridization between said nucleic acid samples wherein said differences in hybridization 
indicate differences in said nucleic acid levels. 

In another embodiment, the methods of identifying differences in nucleic 
25 acid levels between two or more nucleic acid samples involves the steps of: (a) providing 
one or more oligonucleotide arrays each comprising probe oligonucleotides wherein said 
probe oligonucleotides comprise a nucleotide sequences or subsequences selected 
according to a process selected from the group consisting of a random selection, a 
haphazard selection, a nucleotide composition biased selection, and all possible 
30 oligonucleotides of a preselected length; (b) hybridizing said nucleic acid samples to said 
one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid 



samples and probe oligonucleotides in said one or more arrays that are complementary to 
said nucleic acids or subsequences thereof; and (c) determining differences in hybridization 
between said nucleic acid samples wherein said differences in hybridization indicate 
differences in said nucleic acid levels. 

In another embodiment, the methods of identifying differences in nucleic 
acid levels between two or more nucleic acid samples involve the steps of: (a) 
providing one or more oligonucleotide arrays each comprising probe oligonucleotides 
wherein said probe oligonucleotides comprise a nucleotide sequence or subsequences 
selected according to a process selected from the group consisting of a random selection, a 
haphazard selection, a nucleotide composition biased selection, and all possible 
oligonucleotides of a preselected length; (b) providing software describing the location and 
sequence of probe oligonucleotides on said array; (c) hybridizing said nucleic acid samples 
to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic 
acid samples and probe oligonucleotides in said one or more arrays that are complementary 
to said nucleic acids or subsequences thereof; and (d) operating said software such that said 
hybridizing indicates differences in said nucleic acid levels. 

This invention also provides methods of simultaneously monitoring the 
expression of a multiplicity of genes. In one embodiment these methods involve (a) 
providing a pool of target nucleic acids comprising RNA transcripts of one or more of said 
genes, or nucleic acids derived from said RNA transcripts; (b) hybridizing said pool of 
nucleic acids to an oligonucleotide array comprising probe oligonucleotides immobilized 
on a surface; (c) contacting said oligonucleotide array with a ligase; and (d) quantifying the 
hybridization of said nucleic acids to said array wherein said quantifying provides a 
measure of the levels of transcription of said genes. 

Still yet another method of identifying differences in nucleic acid levels 
between two or more nucleic acid samples involves the steps of: (a) providing one or more 
arrays of oligonucleotides each array comprising pairs of probe oligonucleotides where the 
members of each pair of probe oligonucleotides differ from each other in preselected 
nucleotides; (b) hybridizing said nucleic acid samples to said one or more arrays to form 
hybrid duplexes between nucleic acids in said nucleic acid samples afld probe 
oligonucleotides in said one or more arrays that are complementary to said nucleic acids or 



subsequences thereof; (c) determining the differences in hybridization between said nucleic 
acid samples wherein said differences in hybridization indicate differences in said nucleic 
acid levels. 

Another method of simultaneously monitoring the expression of a 
multiplicity of genes, involves the steps of: (a) providing one or more oligonucleotide 
arrays comprising probe oligonucleotides wherein said probe oligonucleotides comprise a 
constant region and a variable region; (b) providing a pool of target nucleic acids 
comprising RNA transcripts of one or more of said genes, or nucleic acids derived from 
said RNA transcripts; (c) hybridizing said pool of nucleic acids to an array of 
oligonucleotide probes immobilized on a surface; and (d) quantifying the hybridization of 
said nucleic acids to said array wherein said quantifying provides a measure of the levels of 

transcription of said genes. 

This invention additionally provides methods of making a nucleic acid array 
for identifying differences in nucleic acid levels between two or more nucleic acid 
samples. In one embodiment the method involves thesteps of: (a) providing an 
oligonucleotide array comprising probe oligonucleotides wherein said probe 
oligonucleotides comprise a constant region and a variable region; (b) hybridizing one or 
more of said nucleic acid samples to said arrays to form hybrid duplexes of said variable 
region and nucleic acids in said nucleic acid samples comprising subsequences 
complementary to said variable region; (c) attaching the sample nucleic acids comprising 
said hybrid duplexes to said array of probe oligonucleotides; and (d) removing unattached 
nucleic acids to provide a high density oligonucleotide array bearing sample nucleic acids 

attached to said array. 

In another embodiment the method of making a nucleic acid array for 
identifying differences in nucleic acid levels between two or more nucleic acid samples, 
involves the steps of: (a) providing a high density array; (b) contacting said array one or 
more of said two or more nucleic acid samples whereby nucleic acids of said one of said 
two or more nucleic acid samples form hybrid duplexes with probe oligonucleotides in said 
arrays; (c) attaching the sample nucleic acids comprising said hybrid duplexes to said array 
of probe oligonucleotides; and (d) removing unattached nucleic acids to provide a high 
density oligonucleotide array bearing sample nucleic acids attached to said array. 



This invention additionally provides kits for practice of the methods 
described herein. One kit comprises a container containing one or more oligonucleotide 
arrays said arrays comprising probe oligonucleotides attached to a surface; and a container 
containing a ligase. Another kit comprises a container containing one or more 
oligonucleotide arrays said arrays comprising probe oligonucleotides wherein said probe 
Oligonucleotides comprise a constant region and a variable region. This kit optionally 
includes a constant oligonucletide complementary to said constant region or a subsequence 
thereof. 

Preferred high density oligonucleotide arrays of this invention comprise 
more than 100 different probe oligonucleotides wherein: each different probe 
oligonucleotide is localized in a predetermined region of the array; each different probe 
oligonucleotide is attached to a surface through a terminal covalent bond; and the density 
of said probe different oligonucleotides is greater than about 60 different oligonucleotides 
per 1 cm 2 . The high density arrays can be used in all of the array-based methods discussed 
herein. High density arrays used for expressio monitoring will typically include 
oligonucleotide probes selected to be complementary to a nucleic acid derived from one or 
more preselected genes. In contrast, generic difference screening arrays may contain probe 
oligonucleotides selected randomly, haphazardly, arbitrarily, or including sequences or 
subsequences comprising all possible nucleic acid sequences of a particular (preselected) 
length.' 

In a preferred embodiment, pools of oligonucleotides or oligonucleotide 
subsequences comprising all possible nucleic acids of a particular length are selected from 
the group consisting of all possible 6 mers, all possible 7 mers, all possible 8 mers, all 
possible 9 mers, all possible 10 mers, all possible 1 1 mers, and all possible 12 mers 

This invention also provides methods of labeling a nucleic acid. In one 
embodiment, this method involves the steps of: (a) providing a nucleic acid; (b) 
amplifying said nucleic acid to form amplicons; (c) fragmenting said amplicons to form 
fragments of said amplicons; and (d) coupling a labeled moiety to at least one of said 
fragments. 

In another embodiment, the methods involve the steps of: (a) providing a 
nucleic acid; (b) transcribing said nucleic acid to formed a transcribed nucleic acid; (c) 



fragmenting said transcribed nucleic acid to form fragments of said transcribed nucleic 
acid; and (d) coupling a labeled moiety to at least one of said fragments. 

In yet another embodiment, the methods involve the steps of: (a) providing 
at least one nucleic acid coupled to a support; (b) providing a labeled moiety capable of 
5 being coupled with a terminal transferase to said nucleic acid; (c) providing said terminal 
transferase; and (d) coupling said labeled moiety to said nucleic acid using said terminal 
transferase. 

In still another embodiment, the methods involve the steps of: (a) providing 
at least two nucleic acids coupled to a support; (b) increasing the number of monomer units 
10 of said nucleic acids to form a common nucleic acid tail on said at least two nucleic acids; 

(c) providing a labeled moiety capable of recognizing said common nucleic acid tails; and 

(d) contacting said common nucleic acid tails and said labeled moiety. 
In still yet another embodiment, the methods involve the steps of: (a) 

providing at least one nucleic acid coupled to a support; (b) providing a labeled moiety 
f ^5 capable of being coupled with a ligase to said nucleic acid; (c) providing said ligase; and 
(d) coupling said labeled moiety to said nucleic acid using said ligase. 

This invention also provides compounds of the formulas described herein. 
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Q Definitions. 

V 20 An array of oligonucleotides as used herein refers to a multiplicity of 

different (sequence) oligonucleotides attached (preferably through a single terminal 
covalent bond) to one or more solid supports where, when there is a multiplicity of 
supports, each support bears a multiplicity of oligonucleotides. The term "array" can refer 
to the entire collection of oligonucleotides on the support(s) or to a subset thereof. The 
25 term "same array" when used to refer to two or more arrays is used to mean arrays that 
have substantially the same oligonucleotide species thereon in substantially the same 
abundances. The spatial distribution of the oligonucleotide species may differ between the 
two arrays, but, in a preferred embodiment, it is substantially the same. It is recognized 
that even where two arrays are designed and synthesized to be identical there are variations 
30 in the abundance, composition, and distribution of oligonucleotide probes. These 
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variations are preferably insubstantial and/or compensated for by the use of controls as 
described herein. 

The phrase "massively parallel screening" refers to the simultaneous 
screening of at least about 100, preferably about 1000, more preferably about 10.000 and 
most preferably about 1,000,000 different nucleic acid hybridizations. 

The terms "nucleic acid" or "nucleic acid molecule" refer to a 
deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, 
and unless otherwise limited, would encompass known analogs of natural nucleotides that 
can function in a similar manner as naturally occurring nucleotides. 

An oligonucleotide is a single-stranded nucleic acid ranging in length from 
2 to about 1000 nucleotides, more typically from 2 to about 500 nucleotides in length. 

As used herein a "probe" is defined as an oligonucleotide capable of binding 
to a target nucleic acid of complementary sequence through one or more types of chemical 
bonds, usually through complementary base pairing, usually through hydrogen bond 
formation. As used herein, an oligonucleotide probe may include natural {i.e. A, G, C, or 
T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in 
oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, so 
long as it does not interfere with hybridization. Thus, oligonucleotide probes may be 
peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than 

phosphodiester linkages. 

The term "target nucleic acid" refers to a nucleic acid (often derived from a 
biological sample and hence referred to also as a sample nucleic acid), to which the 
oligonucleotide probe specifically hybridizes. It is recognized that the target nucleic acids 
can be derived from essentially any source of nucleic acids (e.g., including, but not limited 
to chemical syntheses, amplification reactions, forensic samples, etc.) It is either the 
presence or absence of one or more target nucleic acids that is to be detected, or the 
amount of one or more target nucleic acids that is to be quantified. The target nucleic 
acid(s) that are detected preferentially have nucleotide sequences that are complementary 
to the nucleic acid sequences of the corresponding probe(s) to which they specifically bind 
(hybridize). The term target nucleic acid may refer to the specific subsequence of a larger 
nucleic acid to which the probe specifically hybridizes, or to the overall sequence (e.g.. 
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gene or mRNA) whose abundance (concentration) and/or expression level it is desired to 
detect. The difference in usage will be apparent from context. 

A "ligatable oligonucleotide" or "ligatable probe" or "ligatable 
oligonucleotide probe" refers to an oligonucleotide that is capable of being ligated to 
another oligonucleotide by the use of a ligase (e.g., T4 DNA ligase). The ligatable 
oligonucleotide is preferably a deoxyribonucleotide. The nucleotides comprising the 
ligatable oligonucleotide are preferably the "standard" nucleotides; A, G, C, and Tor U. 
However derivatized, modified, or alternative nucleotides {e.g., inosine) can be present as 
long as their presence does not interfere with the ligation reaction. The ligatable probe 
may be labeled or otherwise modified as long as the label does not interfere with the 
ligation reaction. Similarly the intemucleotide linkages can be modified as long as the 
modification does not interfere with ligation. Thus, in some instances, the ligatable 
oligonucleotide can be a peptide nucleic acid. 

"Subsequence" refers to a sequence of nucleic acids that comprises a part of 

a longer sequence of nucleic acids. 

A "wobble" refers to a degeneracy at a particular position in an 
oligonucleotide. A fully degenerate or "4 way" wobble refers to a collection of nucleic 
acids {e.g., oligonucleotide probes having A, G, C, or T for DNA or A, G, C, or U for RNA 
at the wobble position.) A wobble may be approximated by the replacement of the 
nucleotide with inosine which will base pair with A, G, C, or T or U. Typically 
oligonucleotides containing a fully degenerate wobble produced during chemical synthesis 
of an oligonucleotide is prepared by using a mixture of four different nucleotide monomers 
at the particular coupling step in which the wobble is to be introduced. 

The term"cross-linking" when used in reference to cross-linking nucleic 
' acids refers to attaching nucleic acids such that they are not separated under typical 
conditions that are used to denature complementary nucleic acid sequences. Crosslinking 
preferably involves the formation of covalent linkages between the nucleic acids. Methods 
of cross-linking nucleic acids are described herein. 

The phrase "coupled to a support" means bound directly or indirectly 
thereto including attachment by covalent binding, hydrogen bonding.-ionic interaction, 
hydrophobic interaction, or otherwise. 



11 

"Amplicons" are the products of the amplification of nucleic acids by PCR 

or otherwise. 

"Transcribing a nucleic acid" means the formation of a ribonucleic acid 
from a deoxyribonucleic acid and the converse (the formation of a deoxyribonucleic acid 
from a ribonucleic acid). A nucleic acid can be transcribed by DNA-dependent RNA 
polymerase, reverse transcriptase, or otherwise. 

A labeled moiety means a moiety capable of being detected by the various 

methods discussed herein or known in the art. 

The term "complexity"is used here according to standard meaning of this 
term as established by Britten et al. Methods ofEnzymol. 29:363 (1974). See, also Cantor 
andSchimmel Biophysical Chemistry: Part ///at 1228-1230 for further explanation of 

nucleic acid complexity. 

"Bind(s) substantially" refers to complementary hybridization between a 
probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be 
accommodated by reducing the stringency of the hybridization media to achieve the 
desired detection of the target polynucleotide sequence. 

The phrase •'hybridizing specifically to", refers to the binding, duplexing, or 
hybridizing of a molecule preferentially to a particular nucleotide sequence under stringent 
conditions when that sequence is present in a complex mixture {e.g., total cellular) DNA or 
RNA. The term "stringent conditions" refers to conditions under which a probe will 
hybridize preferrentially to its target subsequence, and to a lesser extent to, or not at all to, 
other sequences. Stringent conditions are sequence-dependent and will be different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures. 
Generally, stringent conditions are selected to be about 5 °C lower than the thermal melting 
point (TJ for the specific sequence at a defined ionic strength and pH. The T m is the 
temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 
50% of the probes complementary to the target sequence hybridize to the target sequence at 
equilibrium. (As the target sequences are generally present in excess, at T m , 50% of the 
probes are occupied at equilibrium). Typically, stringent conditions will be those in which 
the salt concentration is at least about 0.01 to 1.0 M Na ion concentration (or other salts) at 
pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 50 
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nucleotides). Stringent conditions may also be achieved with the addition of destabilizing 

agents such as formamide. 

The term "perfect match probe" refers to a probe that has a sequence that is 
perfectly complementary to a particular target sequence. The test probe is typically 
perfectly complementary to a portion (subsequence) of the target sequence. The perfect 
match (PM) probe can be a "test probe", a "normalization control" probe, an expression 
level control probe and the like. A perfect match control or perfect match probe is, 
however, distinguished from a "mismatch control" or "mismatch probe." In the case of 
expression monitoring arrays, perfect match probes are typically preselected (designed) to 
be complementary to particular sequences or subsequences of target nucleic acids (e.g., 
particular genes). In contrast, in generic difference screening arrays, the particular target 
sequences are typically unknown. In the latter case, prefect match probes cannot be 
preselected. The term perfect match probe in this context is to distinguish that probe from 
a corresponding "mismatch control" that differs from the perfect match in one or more 
h particular preselected nucleotides as described below. 

The term "mismatch control" or "mismatch probe", in expression 
monitoring arrays, refers to probes whose sequence is deliberately selected not to be 
perfectly complementary to a particular target sequence. For each mismatch (MM) control 
in a high-density array there preferably exists a corresponding perfect match (PM) probe 
20 that is perfectly complementary to the same particular target sequence. In "generic" (e.g., 
random, arbitrary, haphazard, etc.) arrays, since the target nucleic acid(s) are unknown 
perfect match and mismatch probes cannot be a priori determined, designed, or selected. 
In this instance, the probes are preferably provided as pairs where each pair of probes differ 
in one or more preselected nucleotides. Thus, while it is not known a priori which of the 
25 probes in the pair is the perfect match, it is known that when one probe specifically 

hybridizes to a particular target sequence, the other probe of the pair will act as a mismatch 
control for that target sequence. It will be appreciated that the perfect match and mismatch 
probes need not be provided as pairs, but may be provided as larger collections (e.g., 3. 4, 
5, or more) of probes that differ from each other in particular preselected nucleotides. 
30 While the mismatch(s) may be located anywhere in the mismatch probe, terminal 

mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization 
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of the target sequence. In a particularly preferred embodiment, the mismatch is located at 
or near the center of the probe such that the mismatch is most likely to destabilize the 
duplex with the target sequence under the test hybridization conditions. In a particularly 
preferred embodiment, perfect matches differ from mismatch controls in a single centrally- 

located nucleotide. 

The terms "background" or "background signal intensity" refer to 
hybridization signals resulting from non-specific binding, or other interactions, between 
the labeled target nucleic acids and components of the oligonucleotide array (e.g., the 
oligonucleotide probes, control probes, the array substrate, etc.). Background signals may 
also be produced by intrinsic fluorescence of the array components themselves. A single 
background signal can be calculated for the entire array, or a different background signal 
may be calculated for each region of the array. In a preferred embodiment, background is 
calculated as the average hybridization signal intensity for the lowest 1% to 10% of the 
probes in the array, or region of the array. In expression monitoring arrays {i.e., where 
probes are preselected to hybridize to specific nucleic acids (genes)), a different 
background signal may be calculated for each target nucleic acid. Where a different 
background signal is calculated for each target gene, the background signal is calculated 
for the lowest 1% to 10% of the probes for each gene. Of course, one of skill in the art will 
appreciate that where the probes to a particular gene hybridize well and thus appear to be 
specifically binding to a target sequence, they should not be used in a background signal 
calculation. Alternatively, background may be calculated as the average hybridization 
signal intensity produced by hybridization to probes that are not complementary to any 
sequence found in the sample (e.g. probes directed to nucleic acids of the opposite sense or 
to genes not found in the sample such as bacterial genes where the sample is of mammalian 
origin). Background can also be calculated as the average signal intensity produced by 
regions of the array that lack any probes at all. 

The term "quantifying" when used in the context of quantifying nucleic acid 
abundances or concentrations (e.g., transcription levels of a gene) can refer to absolute or 
to relative quantification. Absolute quantification may be accomplished by inclusion of 
known concentration(s) of one or more target nucleic acids (e.g. control nucleic acids such 
as BioB or with known amounts the target nucleic acids themselves) and referencing the 
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hybridization intensity of unknowns with the known target nucleic acids (e.g. through 
generation of a standard curve). Alternatively, relative quantification can be accomplished 
by comparison of hybridization signals between two or more genes, or between two or 
more treatments to quantify the changes in hybridization intensity and, by implication, 

5 transcription level. 

The"percentage of sequence identity" or "sequence identity" is determined 
by comparing two optimally aligned sequences or subsequences over a comparison 
window or span, wherein the portion of the polynucleotide sequence in the comparison 
window may optionally comprise additions or deletions (i.e., gaps) as compared to the 
10 reference sequence (which does not comprise additions or deletions) for optimal alignment 
of the two sequences. The percentage is calculated by determining the number of positions 
at which the identical subunit (e.g. nucleic acid base or amino acid residue) occurs in both 
§ sequences to yield the number of matched positions, dividing the number of matched 
Sj positions by the total number of positions in the window of comparison and multiplying 
§5 the result by 100 to yield the percentage of sequence identity. Percentage sequence identity 
when calculated using the programs GAP or BESTFIT (see below) is calculated using 

0 

default gap weights. 

! 'J Methods of alignment of sequences for comparison are well known in the 

P art. Optimal alignment of sequences for comparison may be conducted by the local 

20 homology algorithm of Smith and Waterman, Adv. Appl. Math. 2: 482 (1981), by the 

homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48: 443 (1970), by 
the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85: 
2444 (1988), by computerized implementations of these algorithms (including, but not 
limited to CLUSTAL in the PC/Gene program by Intelligenetics, Moutain View, 
25 California, GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software 
Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, USA), 
or by inspection. In particular, methods for aligning sequences using the CLUSTAL 
program are well described by Higgins and Sharp in Gene, 73: 237-244 (1988) and in 
CABIOS 5: 151-153 (1989)). 

30 
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R RIFF DESCRIPTION OF THF DRAWINGS 

Fig. 1 shows a schematic of expression monitoring using oligonucleotide 
arrays. Extracted poly (A) + RNA is converted to cDNA, which is then transcribed in the 
presence of labeled ribonucleotide triphosphates. L is either biotin or a dye such as 
5 fluorescein. RNA is fragmented with heat in the presence of magnesium ions. 

Hybridizations are carried out in a flow cell that contains the two-dimensional DNA probe 
arrays. Following a brief washing step to remove unhybridized RNA, the arrays are 
scanned using a scanning confocal microscope. Alternatives in which cellular mRNA is 
directly labeled without a cDNA intermediate are described in the Examples. Image 
10 analysis software converts the scanned array images into text files in which the observed 
intensities at specific physical locations are associated with particular probe sequences. 

Fig. 2A shows a fluorescent image of a high density array containing over 
16,000 different oligonucleotide probes. The image was obtained following hybridization 
3 (15 hours at 40°C) of biotin-labeled randomly fragmented sense RNA transcribed from the 
m murine B cell (T10) cDNA library, and spiked at the level of 1 :3,000 (50 pM equivalent to 
«j about 1 00 copies per cell) with 1 3 specific RNA targets. The brightness at any location is 
indicative of the amount of labeled RNA hybridized to the particular oligonucleotide 
probe. Fig. 2B shows a small portion of the array (the boxed region of Fig. 2A) containing 
probes for IL-2 and IL-3 RNAs. For comparison, Fig. 2C shows shown the same region of 
20 the array following hybridization with an unspiked Tl 0 RNA samples (Tl 0 cells do not 
express IL-2 and IL-3). The variation in the signal intensity was highly reproducible and 
reflected the sequence dependence of the hybridization efficiencies. The central cross and 
the four comers of the array contain a control sequence that is complementary to a 
biotin-labeled oligonucleotide that was added to the hybridization solution at a constant 
25 concentration (50 pM). The sharpness of the images near the boundaries of the features 
was limited by the resolution of the reading device (1 1 .25 um) and not by the spatial 
resolution of the array synthesis. The pixels in the border regions of each synthesis feature 
were systematically ignored in the quantitative analysis of the images. 

Fig. 3 provides a log/log plot of the hybridization intensity (average of the 
30 PM-MM intensity differences for each gene) versus concentration for 1 1 different RNA 

targets. The hybridization signals were quantitatively related to target concentration. The 
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experiments were performed as described in the Examples herein and in Fig. 2. The ten 10 
cytokine RNAs (plus bioB) were spiked into labeled T10 RNA at levels ranging from 
1:300,000 to 1:3,000. The signals continued to increase with increased concentration up to 
frequencies of 1 :300, but the response became sublinear at the high levels due to saturation 
5 of the probe sites, The linear range can be extended to higher concentrations by using 

shorter hybridization times. RNAs from genes expressed in T10 cells (IL-10, p-actin and 
GAPDH) were also detected at levels consistent with results obtained by probing cDNA 
libraries. 

Fig. 4 shows cytokine mRNA levels in the murine 2D6 T helper cell line at 
10 different times following stimulation with PMA and a calcium ionophore. Poly (A) + RNA 
p was extracted at 0, 2, 6, and 24 hours following stimulation and converted to double 
|| stranded cDNA containing an RNA polymerase promoter. The cDNA pool was then 
J:iJ transcribed in the presence of biotin labeled ribonucleotide triphosphates, fragmented, and 
H hybridized to the oligonucleotide probe arrays for 2 and 22 hours. The fluorescence 
^5 intensities were converted to RNA frequencies by comparison with the signals obtained for 

a bacterial RNA (biotin synthetase) spiked into the samples at known amounts prior to 
f 1 hybridization. A signal of 50,000 corresponds to a frequency of approximately 1:100,000 

fish 

y to a frequency of 1 :5,000, and a signal of 100 to a frequency of 1 :50,000. RNAs for IL-2, 
};;; IL-4, IL-6, and IL-12p40 were not detected above the level of approximately 1 :200,000 in 

20 these experiments. The error bars reflect the estimated uncertainty (25 percent) in the level 
for a given RNA relative to the level for the same RNA at .a different time point. The 
relative uncertainty estimate was based on the results of repeated spiking experiments, and 
on repeated measurements of IL-10, P-actin and GAPDH RNAs in preparations from both 
T10 and 2D6 cells (unstimulated). The uncertainty in the absolute frequencies includes 
25 message-to-message differences in the hybridization efficiency as well as differences in the 
mRNA isolation, cDNA synthesis, and RNA synthesis and labeling steps. The uncertainty 
in the absolute frequencies is estimated to be a factor of three. 

Fig. 5 shows a fluorescence image of an array containing over 63,000 
different oligonucleotide probes for 1 18 genes. The image was obtained following 
30 overnight hybridization of a labeled murine B cell RNA sample. Eacfi square synthesis 
region is 50 x 50 ^im and contains 107 to 108 copies of a specific oligonucleotide. The 



17 

array was scanned at a resolution of 7.5 urn in approximately 1 5 minutes. The bright rows 
indicate RNAs present at high levels. Lower level RNAs were unambiguously detected 
based on quantitative evaluation of the hybridization patterns. A total of 21 murine RNAs 
were detected at levels ranging from approximately 1 :300,000 to 1 : 100. The cross in the 
center, the checkerboard in the corners, and the MUR-1 region at the top contain probes 
complementary to a labeled control oligonucleotide that was added to all samples. 

Fig. 6 shows an example of a computer system used to execute the software 

of an embodiment of the present invention. 

Fig. 7 shows a system block diagram- of a typical computer system used to 
execute the software of an embodiment of the present invention. 

Fig. 8 shows the high level flow of a process of monitoring the expression 
of a gene by comparing hybridization intensities of pairs of perfect match and mismatch 
probes. 

Fig. 9 shows the flow of a process of determining if a gene is expressed 

utilizing a decision matrix. 

Figs. 10A and 10B show the flow of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data. 

Fig. 1 1 shows the flow of a process of increasing the number of probes for 
monitoring the expression of genes after the number of probes has been reduced or pruned. 

Figs. 12a and\2b illustrate the probe oligonucleotide/ligation reaction 
system. Fig. 12 generally illustrates the various components of the probe 
oligonucleotidefligation reaction system. Fig. 12b illustrates discrimination of non- 
perfectly complementary target:oligdnucleotide hybrids using the probe 
oligonucleotide/ligation reaction syster 

Figs. 13a, 13b, 13c, and 13d illustrate the various components of 
ligation/hybridization reactions and illustrates various ligation strategies. Fig. 1 3a 
illustrates various components of the ligation/hybridization reaction some of which are 
optional in various embodiments. Fig. 13b illustrates a ligatiion strateby that discriminates 
mismatches at the terminus of the probe oligonucleotide. Fig. 13c illustrates a ligation 
strategy that discriminates mismatches at the terminus of the sample oligonucleotide. Fig. 
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13d illustrates a method for improving the discrimination at both the probe terminus and 
the sample terminus. 

Figs. 14a, 14b, 14c, and 14d illustrates a ligation discrimination used in 
conjunction with a restriction digest of the sample nucleic acid. Fig. 14a shows the 
5 recognition site and cleavage pattern of SacI (a 6 cutter) and Hsp92 II (4 cutter). Fig. 14b 
illustrates the effect of SacI cleavage on a (target) nucleic acid sample. Fig. 14c illustrates 
a 6 Mb genome (i.e., E. coli) digested with SacI and SphI generating - lkb genomic 
fragments with a 5 ! C. Fig. 14d illustrates the hybridization/ligation of these fragments to a 
generic difference screening chip and their subsequent use as probes to hybridize to the 
10 appropriate ncuelic acid (Format I) or the fragments are labeled, hybridized/1 igated to the 
Q oligonucletide aray and directly analyzed (Format II). 

JJji Figs. 15a, 15b, 15c, 15d, and 15e illustrate the analysis of differntial diaplay 

DNA fragments on a generic difference screenign array. Fig. 15a shows first strand cDNA / D ? 

M synthesis by reverse transcripton of poly(a) mRNA using an anchored poly(T) primer: Fig. 9 

v\ i . 

*,|5 15b illustrates upstream primers for PCR reaction containing an engineered restrictionsite 
and degenerate bases (N=A,G,C,T) at the 3* end. Fig. 15c shows randomly primed PCR of 

pi first strand cDNA. Fig. 15d shows restrictiondigest of PCR products, and Fig. 15e shows 

la sorting of PCR products on a generic gligationarray by their 5 f end. 

jp ? Figs. 16a, 16b, and 16c illustrate the differences between replicate 1 and 

20 replicate 2 for sample 1 and sample 2 nucleic acids. Fig. 1 6a shows the differences 

between replicate 1 and replicate 2 for sample 1, the normal cell line. Fig. 16b shows the 
differences between replicate 1 and replicate 2 for sample 2, the tumor cell line). Figure 
1 6c plots the differences between sample 1 and 2 averaged over the two replicates. 

Figs. 17a, 17b, and 17c illustrates the data of Figs 16A, 16b, and 16c 
25 filtered. Figure 17a shows the relative change in hybridization intensities of replicate 1 and 
2 of sample 1 for the difference of each oligonucleotide pair. Fig. 17b shows the ratio of 
replicate 1 and 2 of sample 2 for the difference of each oligoncleotide pair, normalized, 
filtered, and plotted the same way as in Figure 17A. Fig. 17c shows the ratio of sample 1 
and sample 2 averaged over two replicates for the difference of each oligonucleotide pair. 

30 The ratio is calculated as in Fig. 17A, but based on the absolute value~of 
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[(X 21kl+ X 22k2 )/2]/[(X llkl+ X I2k2 )/2] and [(X im+ X, 2k2 )/2]/[(X 21kl+ X :ik2 )/2] after normalization 
as in Fig. 16c. 

Fig. 18 illustrates post-fragmentation labeling using a CIAP treatment. 
Fig. 19 provides a schematic illustration of pos-hybridization end labeling 

on a high density oligonucleotide array. 

Fig. 20 provides a schematic illustration end-labeling utilizing pre-reaction 
of a high density array prior to hybridization and end labeling. 

Fig. 21 illustrates the results of a measure of post-hybridization TdTase end 

labeling call accuracy. 

Fig. 22 illustrates oligo dT labeling on a high density oligonucleotide array. 

Fig. 23 illustrates various labeling reagents suitable for use in the methods 
disclosed herein. Fig. 23a shows various labeling reagents. Fig. 23b shows still other 
labeling reagents. Fig. 23c shows non-ribose or non-T-deoxyribose-containing labels. Fig. 
23d shows sugar-modified nucleotide analogue labels 23d. 

Fig. 24. illustrates resequencing of a target DNA molecule with a set of 

generic n-mer tiling probes. 

Fig. 25 illustrates four tiling arrrays present on a 4-mer generic array. 
Fig. 26 illustrates base calling at the 8th position in the target. 
Fig. 27 illustrates a base vote table. 

Fig. 28 illustrates the effect of applying correctness score transform to HTV 



data. 



Fig. 29 illustrates mutation detection by intensity comparisons. 
Fig. 30 illustrates bubble formation detection of mutation in the HIV 



genome. 



Fig. 31 illustrates induced difference nearest neighbor probe scoring. 
Fig. 32 illustrates mutations found in an HTV PCR target (B) using a generic 
ligation GeneChip™ and induced difference analysis. 

Fig. 33 illustrates mutation detection using comparisons between a 
reference target and a sample target. 
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it t A II, ED DF SrRIPTION 
/. Expression Monitoring and Generic Difference Screening. 

This invention provides methods of expression monitoring and generic 
difference screening. The term expression monitoring is used to refer to the determination 
of levels of expression of particular, typically preselected, genes. In a preferred 
embodiment, the expression monitoring methods of this invention utilize high density 
arrays of oligonucleotides selected to be complementary to predetermined subsequences of 
the gene or genes whose expression levels are to be detected. Nucleic acid samples are 
hybridized to the arrays and the resulting hybridization signal provides an indication of the 
level of expression of each gene of interest. Because of the high degree of probe 
redundancy (typically there are multiple probes per gene) the expression monitoring 
methods provide an essentially accurate absolute measurement and do not require 
comparison to a reference nucleic acid. 

In another embodiment, this invention provides generic difference screening 
methods, that identify differences in the abundance (concentration) of particular nucleic 
acids in two or more nucleic acid samples. The generic difference screening methods 
involve hybridizing two or more nucleic acid samples to the same array high density 
oligonucleotide array, or to different high density oligonucleotide arrays having the same 
oligonucleotide probe composition, and optionally the same oligonucleotide spatial 
distribution. The resulting hybridizations are then compared allowing determination which 
nucleic acids differ in abundance (concentration) between the two or more samples. 

Where the concentrations of the nucleic acids comprising the samples 
reflects transcription levels genes in a sample from which the nucleic acids are derived, the 
generic difference screening methods permit identification of differences in transcription 
(and by implication in expression) of the nucleic acids comprising the two or more 
samples. The differentially {e.g., over- or under) expressed nucleic acids thus identified 
can be used {e.g., as probes) to determine and/or isolate those genes whose expression 
levels differs between the two or more samples. 

The generic difference screening methods are advantageous in that, in 
contrast to the expression monitoring methods, they require no a priori assumptions about 
the probe oligonucleotide composition of the array. To the contrary, the sequences of the 
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probe oligonucleotides may be random, haphazard, or any arbitrary subset of 
oligonucleotide probes. Where the oligonucleotide probes are short enough (e.g.. less than 
or equal to a 12 mer) the array may contain every possible nucleic acid of that length. 
Despite the fact that the generic difference screening arrays might be arbitrary or random, 
since the sequence of each probe in the array is known the generic difference screemng 
methods still provide direct sequence information regarding the differentially expressed 

nucleic acids in the samples. 

The expression monitoring and generic difference screening methods of this 

invention involve providing an array containing a large number greater than 1.000) of 
arbitrarily selected different oligonucleotide probes (probe oligonucleotides) where the 
sequence and location in the array of each different probe is known. Nucleic acid samples 
(e g mRNA) are hybridized to the probe arrays and the pattern of hybridization is detected. 

It is demonstrated herein and in copending applications U. S Patent Senal 
No 08/529,1 15 filed on September 15, 1995 and PCT/US96/14839 that hybridization with 
high density oligonucleotide probe arrays provides an effective means of detecting and/or 
quantifying the expression of particular nucleic acids in complex nucleic acid populations. 
The expression monitoring and difference screening methods of this invention may be used 
in a wide variety of circumstances including detection of disease, identification of 
differential gene expression between two samples (e.g., a pathological as compared to a 
healthy sample), screening for compositions that upregulate or downregulate the 
expression of particular genes, and so forth. 

In one preferred embodiment, the methods of this invention are used to 
monitor the expression (transcription) levels of nucleic acids whose expression is altered in 
a disease state. For example, a cancer may be characterized by the overexpression of a 
particular marker such as the HER2 (c-erbB-2/neu) proto-oncogene in the case of breast 
cancer. Similarly, overexpression of receptor tyrosine kinases (RTKs) is associated with 
the etiology of a number of tumors including carcinomas of the breast, liver, bladder, 
pancreas, as well as glioblastomas, sarcomas and squamous carcinomas (see Carpenter, 
Ann. Rev. Biochem., 56: 881-914 (1987)). Conversely, a cancer (e.g., colerectal, lung and 
breast) may be characterized by the mutation of or underexpression of* tumor suppressor 
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gene such as P53 <*.. Tominaga « «/. Cri.ica, Re, in Oncogenesis, 3: 257-282 
(1992)). 

Where the particular genes of interest are known, the high dens.ty arrays 
win preferably contain probe oligopeptides selected to be complementary to the 
• sequences or subsequences of those genes of interest. High probe redundancy for each 
gene of interest can be achieved and absolute expression levels of each gene can be 

determined. 

Conversely, where it is unknown which genes differ in expressron between 
the healthy and disease stitte the generic difference screening methods of this invention are 
particularly appropriate. Hybridization of the healthy and pathologic*! nucleic acids to the 
generic difference screening arrays disc.osed herein and comparison of the hybridizatton 
patterns identifies those genes whose reguiauon is altered in the pathological state. 

Similarly, the expression monitoring and generic difference screerung 
methods of mis invention can be used to monitor expression of various genes in response 
to defined stimuli, such as a drug, cell activation, e,c. The methods are particuiarly 
advantageous because they permit simultaneous monitoring of the expression of !arge 
numbers of genes. This is especially useful in drug research if the end point desertion ,s 
a comp.ex one, not simply asking if one particular gene is overexpressed or 
undepressed. Thus, where a disease state or the mode of action of a drug is no. well 
characterized, the methods of this invention allow rapid deputation of the particularly 
relevant genes. Again, where the gene of interest is known or suspected, expresston 
monitoring methods will preferably be used, while generic screening memods w.11 be used 
when the particular genes of interest are unknown. 

Using the generic difference screening methods disclosed herein, lack of 
knowledge regarding the particular genes does no. prevent identification of useful 
therapeutics. For example, if the hybridization pattern on a particular high denstty array 
for a healthy cell is known and significantly different from the pattern for a diseased cell, 
men libraries of compounds can be screened for those mat cause the pattern for a diseased 
cell to become like that for the healthy cell. This provides a very detailed measure of the 
cellular response to a drug. 
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Generic difference screening methods thus provide a powerful tool for gene 
discovery and for elucidating mechanisms underiying complex cellular responses to 
various stimuli. For example, in one embodiment, generic difference screening can be 
used for "expression fingerprinting". Suppose i, is found that the mRNA from a cerunn 
cel. type displays a distinct overall hybridization pattern mat is different under different 
conditions («.*. when harboring mutations in particular genes, in a disease stitte). Then 
mis pattern of expression (an expression fingerprint), if reproducible and clearly 
differentiae in the different cases can be used as a very detailed diagnostic. .. is no. even 
required that the pattern be fiolly intertable, but jus, mat it is specific for a particular cell 
state (and preferably of diagnostic and/or prognostic relevance). 

Both expression monitoring methods and generic difference screemng may 
also be used in drug safety studies. For exam P .e, if one is maMng a new antibiotic, then « 
should no. significantly affect the expression profile for mammalian cells. The 
hybridization pattern could be used as a detailed measure of tire effect of a drug on cells. 

»16 In other words, as a lexicological screen. 

The expression monitoring and generic difference screening methods of tins 
S invention are particularly well suited for gene discovery. For example, as explained above, 
ft the generic difference screening methods identify differences in abundances of nucletc 
3 acids i„ two or more samples. These differences may indicate changes in the expresston 
" levels of previously unknown genes. The sequent information provided by a difference 
screening array can be utilized, as described herein, to identify the unknown gene. 

The expression monitoring methods can be used in gene discovery by 
exploiting the fact that many genes that have been discovered to date have been classified 
into famines based on commonality of me sequences. Because of the extremely large 
number of probes it is possible to place in the high density array, it is possible ,0 tncU.de 
oligonucleotide probes representing known or parts of known members from every gene 
class in utilizing such a "chip" (high density array) genes that are already known would 
give a positive signal a, loci containing both variable and common regions. For unknown 
genes, only the common regions of the gene family would give a positive signal. The 
30 result would indicate the possibility of a newly discovered gene. 
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The expression monitoring and generic difference screening methods of this 
invention thus also allow the development of "dynamic^ gene databases. The Human 
Genome Project and commercial sequencing projects have generated large, static databases 
which list thousands of sequences without regard to function or genetic interaction. 
Analyses using the methods of this invention produces "dynamic" databases that define a 
gene's function and its interactions with other genes. Without the ability to monitor the 
expression of large numbers of genes simultaneously, or the abilito to detect differences in 
abundances of large numbers of "unknown" nucleic acids simultaneously, the work of 

creating such a database is enormous. 

The tedious nature of using DNA sequence analysis for determining an 
expression pattern involves preparing a cDNA library from the RNA isolated from the 
cells of interest and then sequencing the library. As the DNA is sequenced, the operator 
lists the sequences that are obtained and counts them. Thousands of sequences would have 
to be determined and then the frequency of those gene sequences would define the 
expression pattern of genes for the cells being studied. 

By contrast, using an expression monitoring, or generic difference 
screening, array to obtain the data according to the methods of this invention is relatively 
fast and easy. For example to in one embodiment, cells may be stimulated to induce 
expression. The RNA is obtained from the cells and then either labeled directly or a cDNA 
copy is created. Fluorescent molecules may be incorporated during the DNA 
polymerization. Either the labeled RNA or the labeled cDNA is then hybridized to a high 
density array in one overnight experiment. The hybridization provides a quantitative 
assessment of the levels of every single one of the hybridized nucleic acids with no 
additional sequencing. In addition the methods of this invention are much more sensitive 
allowing a few copies of expressed genes per cell to be detected. This procedure is 
demonstrated in the examples provided herein. These uses of the methods of this 
invention are intended to be illustrative and in no manner limiting. 

//. High Density Arrays For Generic Difference Screening and 
Expression Monitoring. 
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As indicated above, this invention provides methods of monitoring 
(detecting and/or quantifying) the expression levels of a large number of nucleic acids 
and/or determining differences in nucleic acid concentrations (abundances) between two or 
more samples. The methods involve hybridization of one or more a nucleic acid samples 
(target nucleic acids) to one or more high density arrays of nucleic acid probes and then 
quantifying the amount of target nucleic acids hybridized to each probe in the array. 

While nucleic acid hybridization has been used for some time to determine 
the expression levels of various genes (e.g., Northern Blot), it was a surprising discovery of 
this invention that high density arrays are suitable for the quantification of the small 
variations in abundance (e.g.. transcription and, by implication, expression) of a nucleic 
acid (e.g., gene) in the presence of a large population of heterogenous nucleic acids. The 
signal (e.g.. particular gene or gene product, or differentially abundant nucleic acid) may be 
present at a concentration of less than about 1 in 1,000, and is often present at a 
concentration less than 1 in 10,000 more preferably less than about 1 in 50,000 and most 
preferably less than about 1 in 100,000, 1 in 300,000, o'r even 1 in 1,000,000. 

The oligonucleotide arrays can have oligonucleotides as short as 10 
nucleotides, more preferably 15 oligonucleotides and most preferably 20 or 25 
oligonucleotides are used to specifically detect and quantify nucleic acid expression levels. 
Where ligation discrimination methods are used, the oligonculeotide arrays can contain 
shorter oligonucleotides. In this instance, oligonucleotide arrays comprising 
oligonucleotides ranging in length from 6 to 15 nucleotides, more preferably from about 8 
to about 12 nucleotides in length are preferred. Of course arrays containing longer 
oligonucleotides, as described herein, are also suitable. 

The expression monitoring arrays, which are designed to detect particular 
preselected genes, provide for simultaneous monitoring of at least about 10, preferably at 
least about 100, more preferably at least about 1000, still more preferably at least about 
10,000, and most preferably at least about 100,000 different genes. 

A) Advantages of Oligonucleotide Arrays. 

In one preferred embodiment, the high density arrays used in the methods of 
this invention comprise chemically synthesized oligonucleotides. The use of chemically 
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synthesized oligonucleotide arrays, as opposed to. for example, blotted arrays of genomic 
clones, restriction fragments, oligonucleotides, and the like, offers numerous advantages. 
These advantages generally fall into four categories: 

1) Efficiency of production; 

2) Reduced intra- and inter-array variability; 

3) Increased information content; and 

4) Improved signal to noise ratio. 



1) Efficiency of production. 

In a preferred embodiment, the arrays are synthesized using methods of 
spatially addressed parallel synthesis (see, e.g., Section V, below). The oligonucleotides 
are synthesized chemically in a highly parallel fashion covalently attached to the array 
surface. This allows extremely efficient array production. For example, arrays containing 
any collection of tens (or even hundreds) of thousands of specifically selected 20 mer 
oligonucleotides are synthesized in fewer than 80 synthesis cycles. The arrays are designed 
and synthesized based on sequence information alone. Thus, unlike blotting methods, the 
array preparation requires no handling of biological materials. There is no need for cloning 
steps, nucleic acid purifications or amplifications, cataloging of clones or amplification 
products, and the like. The preferred chemical synthesis of high density oligonucleotide 
arrays in this invention is thus more efficient than blotting methods and permits the 
production of highly reproducible high-density arrays. 

2) Reduced intra- and inter-array variability. 

The use of chemically synthesized high-density oligonucleotide arrays in the 
methods of this invention improves intra- and inter-array variability. The oligonucleotide 
arrays preferred for this invention are made in large batches (presently 49 arrays per wafer 
with multiple wafers synthesized in parallel) in a highly controlled reproducible manner. 
This makes them suitable as general diagnostic and research tools permitting direct 
comparisons of assays performed at tifferent times and locations. 

Because of the precise control obtainable during the chemical synthesis the 
arrays of this invention show less than about 25%, preferably less than about 20%, more 
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preferably less than about 15%, still more preferably less than about 10%, even more 
preferably less than about 5%, and most preferably less than about 2% variation between 
high density arrays (within or between production batches) having the same probe 
composition. Array variation is assayed as the variation in hybridization intensity (against 
a labeled control target nucleic acid mixture) in one or more oligonucleotide probes 
between two or more arrays. More preferably, array variation is assayed as the variation in 
hybridization intensity (against a labeled control target nucleic acid mixture) measured for 
one or more target genes between two or more arrays. 

In addition to reducing inter- and intra-array variability, chemically 
synthesized arrays also reduce variations in relative probe frequency inherent in spotting 
methods, particularly spotting methods that use cell-derived nucleic acids (e.g., cDNAs). 
Many genes are expressed at the level of thousands of copies per cell, while others are 
expressed at only a single copy per cell. A cDNA library will reflect this very large bias as 
will a cDNA library made from this material. While normalization (adjustment of the 
amount of each different probe e.g., by comparison to a reference cDNA) of the library will 
reduce the representation of over-expressed sequences to some extent, normalization has 
been shown to lessen the odds of selecting highly expressed cDNAs by only about a factor 
of 2 or 3. In contrast, chemical synthesis methods can insure that all oligonucleotide 
probes are represented in approximately equal concentrations. This decreases the inter- 
20 gene (intra-array) variability and permits direct comparison between bbybridization signals 
for different oligonoucleotide probes. 
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3) Increased information content 

i) Advantages for expression monitoring. 

25 The use of high density oligonucleotide arrays for expression monitoring 

provides a number of advantages not found with other methods. For example, the use of 
large numbers of different probes that specifically bind to the transcription product of a 
particular target gene provides a high degree of redundancy and internal control that 
permits optimization of probe sets for effective detection of particular target genes and 

30 minimizes the possibility of errors due to cross-reactivity with other ribcleic acid species. 
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Apparently suitable probes often prove ineffective for expression 
monitoring by hybridization. For example, certain subsequences of a particular target gene 
may be found in other regions of the genome and probes directed to these subsequences 
will cross-hybridize with the other regions and not provide a signal that is a meaningful 
measure of the expression level of the target gene. Even probes that show little cross 
reactivity may be unsuitable because they generally show poor hybridization due to the 
formation of structures that prevent effective hybridization. Finally, in sets with large 
numbers of probes, it is difficult to identify hybridization conditions that are optimal for all 
the probes in a set. Because of the high degree of redundancy provided by the large 
number of probes for each target gene, it is possible to eliminate those probes that function 
poorly under a given set of hybridization conditions and still retain enough probes to a 
particular target gene to provide an extremely sensitive and reliable measure of the 
expression level (transcription level) of that gene. 

In addition, the use of large numbers of different probes to each target gene 
makes it possible to monitor expression of families of closely-related nucleic acids. The 
probes may be selected to hybridize both with subsequences that are conserved across the 
family and with subsequences that differ in the different nucleic acids in the family. Thus, 
hybridization with such arrays permits simultaneous monitoring of the various members of 
a gene family even where the various genes are approximately the same size and have high 
levels of homology. Such measurements are difficult or impossible with traditional 
hybridization methods. 



ii) General advantages. 
Because the high density arrays contain such a large number of probes it is 
25 possible to provide numerous controls including, for example, controls for variations or 
mutations in a particular gene, controls for overall hybridization conditions, controls for 
sample preparation conditions, controls for metabolic activity of the cell from which the 
nucleic acids are derived and mismatch controls for non-specific binding or cross 
hybridization. 

Effective detection and quantitation of gene transcription in complex 
mammalian cell message populations can be determined with relatively short 
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oligonucleotides and with relative few (e.g., fewer than 40, preferably fewer than 30, more 
preferably fewer than 25, and most preferably fewer than 20, 15, or even 10) 
oligonucleotide probes per gene. There are a large number of probes which hybridize both 
strongly and specifically for each gene. .This does not mean that a large number of probes 
is required for detection, but rather that there are many from which to choose and that 
choices can be based on other considerations such as sequence uniqueness (gene families), 
checking for splice variants, or genotyping hot spots (things not easily done with cDNA 
spotting methods). 

In use, sets of four arrays for expression monitoring are made that contain 
approximately 400,000 probes each. Sets of about 40 probes (20 probe pairs) are chosen 
that are complementary to each of about 40,000 genes for which there are ESTs in the 
public database. This set of ESTs covers roughly one-third to one-half of all human genes 
and these arrays will allow the levels of all of them to be monitored in a parallel set of 
overnight hybridizations. 



4) Improved signal to noise ratio. 

Blotted nucleic acids sometimes rely on ionic, electrostatic, and 
hydrophobic interactions to attach the blotted nucleic acids to the substrate. Bonds are 
formed at multiple points along the nucleic acid restricting degrees of freedom and 
20 interfering with the ability of the nucleic acid to hybridize to its complementary target. In 
contrast, the preferred arrays of this invention are chemically synthesized. The 
oligonucleotide probes are attached to the substrate by a single terminal covalent bond. 
The probes have more degrees of freedom and are capable of participating in complex 
interactions with their complementary targets. Consequently, the probe arrays of this 
25 invention show significantly higher hybridization efficiencies (10 times, 100 times, and 

even 1000 times more efficient) than blotted arrays. Less target oligonucleotide is used to 
produce a given signal thereby dramatically improving the signal to noise ratio. 
Consequently the methods of this invention permit detection of only a few copies of a 
nucleic acid in extremely complex nucleic acid mixtures. 
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B) Preferred High Density Arrays 
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Preferred high density arrays of this invention comprise greater than about 
100, preferably greater than about 1000, more preferably greater than about 16,000 and 
most preferably greater than about 65,000 or 250,000 or even greater than about 1,000.000 
different oligonucleotide probes. The oligonucleotide probes range from about 5 to about 
5 50 or about 5 to about 45 nucleotides, more preferably from about 1 0 to about 40 

nucleotides and most preferably from about 15 to about 40 nucleotides in length. In 
particular preferred embodiments, the oligonucleotide probes are 20 or 25 nucleotides in 
length, while in other preferred embodiments (particularly where ligation discrimination 
reactions are used) the oligonucleotide probes are preferably shorter {e.g., 6 to 20 more 
10 preferably 8 to 15 nucleotides in length). It was a discovery of this invention that relatively 
O short oligonucleotide probes sufficient to specifically hybridize to and distinguish target 

sequences. Thus in one preferred embodiment, the oligonucleotide probes are less than 50 

rk 

■Hi nucleotides in length, generally less than 46 nucleotides, more generally less than 41 

"'J nucleotides, most generally less than 36 nucleotides, preferably less than 3 1 nucleotides, 

111 

NJL5 more preferably less than 26 nucleotides, and most preferably less than 21 nucleotides in 

length. The probes can also be less than 16 nucleotides, less than 13 nucleotides in length, 
■ ,j ; less than 9 nucleotides in length and less than 7 nucleotides in length. It is also recognized 

U that the oligonucleotide probes can be relatively long, ranging in length up to about 1000 

P 

nucleotides, more typically up to about 500 nucleotides in length. 

20 The location and, in some embodiments, sequence of each different 

oligonucleotide probe in the array is known. Moreover, the large number of different 
probes occupies a relatively small area providing a high density array having a probe 
density of generally greater than about 60, more generally greater than about 1 00, most 
generally greater than about 600, often greater than about 1000, more often greater than 

25 about 5,000, most often greater than about 10,000, preferably greater than about 40,000 
more preferably greater than about 1 00,000, and most preferably greater than about 
400,000 different oligonucleotide probes per cm 2 . The small surface area of the array 
(often less than about 10 cm 2 , preferably less than about 5 cm 2 more preferably less than 
about 2 cm 2 , and most preferably less than about 1.6 cm 1 ) permits the use of small sample 

30 volumes and extremely uniform hybridization conditions (temperature regulation, salt 
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content, e/c.) while the extremely large number of probes allows massively parallel 

processing of hybridizations. 

Finally, because of the small area occupied by the high density arrays, 
hybridization may be carried out in extremely small fluid volumes (e.g., 250 ul or less, 
more preferably 100 ul or less, and most preferably 10 ul or less). In addition, 
hybridization conditions are extremely uniform throughout the sample, and the 
hybridization format is amenable to automated processing. 

///. Monitoring Gene Expression and Generic Difference Screening. 

As explained above, this invention provides methods for monitoring gene 
expression (expression monitoring) and for identifying differences in abundance 
(concentration) of nucleic acids in two or more nucleic acid samples (generic difference 
screening). Generally the methods of monitoring gene expression of this invention involve 
(1) providing a pool of target nucleic acids comprising RN A transcript(s) of one or more 
target gene(s), or nucleic acids derived from the RNA transcript(s); (2) hybridizing the 
nucleic acid sample to a high density array of probes (including control probes); and (3) 
detecting the hybridized nucleic acids and calculating a relative expression (transcription) 
level. These methods preferably involve the use of high density oligonucleotide arrays 
containing probes to specifically preselected genes. 

In contrast, the arrays used in the generic difference screening methods of 
this invention do not require that specific target genes be identified. To the contrary, the 
methods are designed to detect changes or differences in expression of various genes where 
the particular gene to be identified is unknown prior to performing the difference 
screening. 

25 The methods of generic difference screening typically involve the steps of: 

1) providing one or more high density oligonucleotide arrays (preferably including probes 
pairs differing in one or more nucleotides); 2) providing two or more nucleic acid samples; 
3) hybridizing the nucleic acid samples to one or more arrays to form hybrid duplexes 
between nucleic acids in the nucleic acid samples and probe oligonucleotides in the 

30 array(s); 3) detecting the hybridization of the nucleic acids to the arrays; and 4) 
determining the differences in hybridization between the nucleic acid samples. 
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The provision of a nucleic acid sample, the hybridization of the sample to 
the arrays, and detection of the hybridized nucleic acid(s) is performed in essentially the 
same manner in expression monitoring and in generic difference screening methods. As 
disclosed herein, in preferred embodiments, the methods are distinguished, in part, by 
5 oligonucleotide probe selection, in the use of at least two nucleic acid samples in generic 
difference screening, and in subsequent analysis. 

A) Providing a Nucleic Acid Sample. 

In order to measure the nucleic acid concentration in a sample, it is 
10 desirable to provide a nucleic acid sample for such analysis. Where it is desired that the 
nucleic acid concentration, or differences in nucleic acid concentration between different 
samples, reflect transcription levels or differences in transcription levels of a gene or genes, 
it is desirable to provide a nucleic acid sample comprising mRNA transcript(s) of the gene 
or genes, or nucleic acids derived from the mRNA transcript(s). As used herein, a nucleic 
!|5 acid derived from an mRNA transcript refers to a nucleic acid for whose synthesis the 
!L mRNA transcript or a subsequence thereof has ultimately served as a template. Thus, a 
W\ cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA 
!"! amplified from the cDNA, an RNA transcribed from the amplified DNA, etc. , are all 
P derived from the mRNA transcript and detection of such derived products is indicative of 
5 20 the presence and/or abundance of the original transcript in a sample. Thus, suitable 

samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA 
reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified 
from the genes, RNA transcribed from amplified DNA, and the like. 

In a particularly preferred embodiment, where it is desired to quantify the 
25 transcription level (and thereby expression) of a one or more genes in a sample, the nucleic 
acid sample is one in which the concentration of the mRNA transcript(s) of the gene or 
genes, or the concentration of the nucleic acids derived from the mRNA transcript(s), is 
proportional to the transcription level (and therefore expression level) of that gene. 
Similarly, it is preferred that the hybridization signal intensity be proportional to the 
30 amount of hybridized nucleic acid. While it is preferred that the proportionality be 
relatively strict (e.g., a doubling in transcription rate results in a doubling in mRNA 



a 
m 

Q 



33 



si 
in 



or a 



transcript in the sample nucleic acid pool and a doubling in hybridization signal), one of 
skill will appreciate that the proportionality can be more relaxed and even non-linear. 
Thus, for example/an assay where a 5 fold difference in concentrate of the target mRNA 
results in a 3 to 6 fold difference in hybridization intensity is sufficient for most purposes. 
Where more precise quantification is required appropriate controls can be run to correct for 
variations introduced in sample preparation and hybridization as described herein. In 
addition serial dilutions of "standard" target mRNAs can be used to prepare calibration 
curves according to methods well known to those of skill in the art. Of course, where 
imple detection of the presence or absence of a transcript or large differences of changes 
nucleic acid concentration is desired, no elaborate control or calibration is required. 

In the simplest embodiment, such a nucleic acid sample is the total mRNA 
total cDNA isolated and/or otherwise derived from a biological sample. The term 
"biological sample", as used herein, refers to a sample obtained from an organism or from 
components (e.g., cells) of an organism. The sample may be of any biological tissue or 
fluid. Frequently the sample will be a "clinical sample" which is a sample denved from a 
patient. Such samples include, but are not limited to, sputum, blood, blood cells (e.g., 
white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, 
or cells therefrom. Biological samples may also include sections of tissues such as frozen 
sections taken for histological purposes. 

The nucleic acid (either genomic DNA or mRNA) may be isolated from the 
sample according to any of a number of methods well known to those of skill in the art. 
One of skill will appreciate that where alterations in the copy number of a gene are to be 
detected genomic DNA is preferably isolated. Conversely, where expression levels of a 
gene or genes are to be detected, preferably RNA (mRNA) is isolated. 

Methods of isolating total mRNA are well known to those of skill in the art. 
For example, methods of isolation and purification of nucleic acids are described in detail 
in Chapter 3 of Laboratory Techniques in Biochemistry and Molecular Biology: 
Hybridization With Nucleic Acid Probes, Parti. Theory and Nucleic Acid Preparation, P. 
Tijssen, ed. Elsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques in 
Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. 
Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993)). 
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In a preferred embodiment, the total nucleic acid is isolated from a given 
sample using, for example, an acid guanidinium-phenol-chloroform extraction method and 
polyA + mRNA is isolated by oligo dT column chromatography or by using (dT)n magnetic 
beads (see. e.g., Sambrook et al.. Molecular Cloning: A Laboratory Manual (2nd ed.), 
Vols. 1-3, Cold Spring Harbor Laboratory, (1989), or Current Protocols in Molecular 
Biology, F. Ausubel et al., ed. Greene Publishing and Wiley-Interscience, New York 
(1987)). 

Frequently, it is desirable to amplify the nucleic acid sample prior to 
hybridization. One of skill in the art will appreciate that whatever amplification method is 
used, if a quantitative result is desired, care must be taken to use a method that maintains 
or controls for the relative frequencies of the amplified nucleic acids. 

Methods of "quantitative" amplification are well known to those of skill in 
the art. For example, quantitative PCR involves simultaneously co-amplifying a known 
quantity of a control sequence using the same primers. This provides an internal standard 
that may be used to calibrate the PCR. reaction. The high density array may then include 
probes specific to the internal standard for quantification of the amplified nucleic acid. 

One preferred internal standard is a synthetic AW106 cRNA. The AW106 
cRNA is combined with RNA isolated from the sample according to standard techniques 
known to those of skill in the art. The RNA is then reverse transcribed using a reverse 
transcriptase to provide copy DNA. The cDNA sequences are then amplified (e.g., by 
PCR) using labeled primers. The amplification products are separated, typically by 
electrophoresis, and the amount of radioactivity (proportional to the amount of amplified 
product) is determined. The amount of mRNA in the sample is then calculated by 
comparison with the signal produced by the known AW106 RNA standard. Detailed 
protocols for quantitative PCR are provided in PCR Protocols, A Guide to Methods and 
Applications, Innis et al.. Academic Press, Inc. N.Y., (1990). 

Other suitable amplification methods include, but are not limited to 
polymerase chain reaction (PCR) (Innis, et al., PCR Protocols. A guide to Methods and 
Application. Academic Press, Inc. San Diego, (1990)), ligase chain reaction (LCR) (see 
Wu and Wallace, Genomics, 4: 560 (1989), Landegren, et al.. Science* 241: 1077 (1988) 
and Barringer, et al., Gene, 89: 117 (1990), transcription amplification (Kwoh, et al., Proc. 
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Natl. Acad. Sci. USA, 86: 1 173 (1989)), and self-sustained sequence replication (Guatelli, 
et ai, Proc. Nat. Acad. Sci. USA, 87: 1874 (1990)). 

In a particularly preferred embodiment, the sample mRNA is reverse 
transcribed with a reverse transcriptase and a primer consisting of oligo dT and a sequence 
encoding the phage T7 promoter to provide single stranded DNA template. The second 
DNA strand is polymerized using a DNA polymerase. After synthesis of double-stranded 
cDNA, T7 RNA polymerase is added and RNA is transcribed from the cDNA template. 
Successive rounds of transcription from each single cDNA template results in amplified 
RNA. Methods of in vitro polymerization are well known to those of skill in the art {see, 
e.g., Sambrook, supra.) and this particular method is described in detail by Van Gelder, et 
ai, Proc. Natl. Acad. Sci. USA, 87: 1663-1667 (1990) who demonstrate that in vitro 
amplification according to this method preserves the relative frequencies of the various 
RNA transcripts. Moreover, Eberwine et al. Proc. Natl. Acad. Sci. USA, 89: 3010-3014 
provide a protocol that uses two rounds of amplification via in vitro transcription to 
achieve greater than 10 6 fold amplification of the original starting material thereby 
permitting expression monitoring even where biological samples are limited. 

It will be appreciated by one of skill in the art that the direct transcription 
method described above provides an antisense (aRNA) pool. Where antisense RNA is 
used as the target nucleic acid, the oligonucleotide probes provided in the array are chosen 
to be complementary to subsequences of the antisense nucleic acids. Conversely, where 
the target nucleic acid pool is a pool of sense nucleic acids, the oligonucleotide probes are 
selected to be complementary to subsequences of the sense nucleic acids. Finally, where 
the nucleic acid pool is double stranded, the probes may be of either sense as the target 
nucleic acids include both sense and antisense strands. 

The protocols cited above include methods of generating pools of either 
sense or antisense nucleic acids. Indeed, one approach can be used to generate either sense 
or antisense nucleic acids as desired. For example, the cDN A can be directionally cloned 
into a vector {e.g., Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked by 
the T3 and T7 promoters. In vitro transcription with the T3 polymerase will produce RNA 
of one sense (the sense depending on the orientation of the insert), while in vitro 
transcription with the T7 polymerase will produce RNA having the opposite sense. Other 
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suitable cloning systems include phage lambda vectors designed for Cre-/o*/> plasmid 
subcloning (see e.g., Palazzolo et ai, Gene, 88: 25-36 (1990)). 

In a particularly preferred embodiment, a high activity RNA polymerase 
{e.g. about 2500 units/uL for T7, available from Epicentre Technologies) is used. 

B) Labeling nucleic acids. 

i) Labeling methods/strategies. 

In a preferred embodiment, the hybridized nucleic acids are detected by 
detecting one or more labels attached to the sample nucleic acids. The labels may be 
incorporated by any of a number of means well known to those of skill in the art. 
However, in a preferred embodiment, the label is simultaneously incorporated during the 
amplification step in the preparation of the sample nucleic acids. For example, polymerase 
chain reaction (PCR) with labeled primers or labeled nucleotides will provide a labeled 
amplification product. The nucleic acid (e.g., DNA) is be amplified in the presence of 
labeled deoxynucleotide triphosphates (dNTPs). The amplified nucleic acid can be 
fragmented, exposed to an oligonoucleotide array, and the extent of hybridization 
determined by the amount of label now associated with the array. In a preferred 
embodiment, transcription amplification, as described above, using a labeled nucleotide 
(e.g. fluorescein-labeled UTP and/or CTP) incorporates a label into the transcribed nucleic 
acids. 

Alternatively, a label may be added directly to the original nucleic acid 
sample (e.g., mRNA, poly A mRNA, cDNA, etc.) or to the amplification product after the 
amplification is completed. Such labeling can result in the increased yield of amplification 
products and reduce the time required for the amplification reaction. Means of attaching 
labels to nucleic acids include, for example nick translation or end-labeling (e.g. with a 
labeled RNA) by kinasing of the nucleic acid and subsequent attachment (ligation) of a 
nucleic acid linker joining the sample nucleic acid to a label (e.g., a fiuorophore). End 
labeling is discussed in more detail below in Section III(B)(iii). 

Detectable labels suitable for use in the present invention include any 
composition detectable by spectroscopic, photochemical, biochemical; immunochemical, 
electrical, optical or chemical means. Useful labels in the present invention include biotin 
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for staining with labeled streptavidin conjugate, magnetic beads (e.g., Dynabeads™), 
fluorescent dyes (e.g.. fluorescein, texas red, rhodamine, green fluorescent protein, and the 
like, see, e.g., Molecular Probes, Eugene, Oregon, USA), radiolabels (e.g., 3 H, ,25 1, 35 S, ,4 C, 
or 32 P), enzymes (e.g., horse radish peroxidase, alkaline phosphatase and others commonly 
used in an ELISA), and colorimetric labels such as colloidal gold (e.g., gold particles in the 
40 -80 nm diameter size range scatter green light with high efficiency) or colored glass or 
plastic (e.g., polystyrene, polypropylene, latex, etc.) beads. Patents teaching the use of such 
labels include U.S. Patent Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437; 

4,275,149; and 4,366,241. 

A fluorescent label is preferred because it provides a very strong signal with 
low background. It is also optically detectable at high resolution and sensitivity through a 
quick scanning procedure. The nucleic acid samples can all be labeled with a single label, 
e.g.,a single fluorescent label. Alternatively, in another embodiment, different nucleic acid 
samples can be simultaneously hybridized where each nucleic acid sample has a different 
label. For instance, one target could have a green fluorescent label and a second target 
could have a red fluorescent label. The scanning step will distinguish cites of binding of 
the red label from those binding the green fluorescent label. Each nucleic acid sample 
(target nucleic acid) can be analyzed independently from one another. 
»3 Suitable chromogens which can be employed include those molecules and 

20 compounds which absorb light in a distinctive range of wavelengths so that a color can be 
observed or, alternatively, which emit light when irradiated with radiation of a particular 
wave length or wave length range, e.g., fluoresces. 

A wide variety of suitable dyes are available, being primary chosen to 
provide an intense color with minimal absorption by their surroundings. Illustrative dye 
25 types include quinoline dyes, triarylmethane dyes, acridine dyes, alizarine dyes, phthaleins, 
insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and 

phenazoxonium dyes. 

A wide variety of fluorescers can be employed either by alone or, 
alternatively, in conjunction with quencher molecules. Fluorescers of interest fall into a 
30 variety of categories having certain primary functionalities. These primary functionalities 
include 1- and 2-aminonaphthalene, p,p'-diaminostilbenes, pyrenes, quaternary 
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phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, 
oxacarbocyanine, marocyanine, 3-aminoequilenin, perylene. bisbenzoxazole. 
bis-p-oxazolyl benzene, 1 ,2-benzophenazin, retinol, bis-3-aminopyridinium salts, 
hellebrigenin, tetracycline, sterophenol, benzimidzaolylphenylamine, 2-oxo-3-chromen. 
indole, xanthen, 7-hydroxycoumarin, phenoxazine, salicylate, strophanthidin, porphynns, 
triarylmethanes and flavin. Individual fluorescent compounds which have functionalities 
for linking or which can be modified to incorporate such functionalities include, e.g., 
dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthhydrol; 
rhodamineisothiocyanate; N-phenyl l-amino-8-sulfonatona P hthalene; N-phenyl 
2-amino-6-sulfonatonaphma^ 

acid; pyrene-3-sulfonic acid; 2-toluidinonaphthalene-6-sulfonate; N-phenyl, N-methyl 
2-aminoaphthalene-6-sulfonate; ethidium bromide; stebrine; 

auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; N,N'-dioctadecyl 
oxacarbocyanine; N,N'-dihexyl oxacarbocyanine; merocyanine, 4(3'pyrenyl)butyrate; 
d-3-aminodesoxy-equilenin; 12-(9'anthroyl)stearate; 2-methylanthracene; 
9-vinylanthracene; 2,2'(vinylene-p- P henylene)bi S benzoxazole; p-bis[2-(4-methyl-5- 
phenyl-oxazolyl)]benzene; 6-dimethylamino-l,2-benzophenazin; retinol; 
bis(3'.aminopyridinium) 1,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibnenin; 
cWorotetmcyclme;N(7-to^^ 

benzimidazolyl)-phenyl]maleimide; N-(4-fluoranthyl)maleimide; bis(homovanillic acid); 
resazarin; 4-chloro-7-nitro-2,l,3benzooxadiazole; merocyanine 540; resorufm; rose bengal; 
and 2,4-diphenyl-3(2H)-furanone. 

Desirably, fluorescers should absorb light above about 300 nm, preferably 
about 350 nm, and more preferably above about 400 nm, usually emitting at wavelengths 
greater than about 10 nm higher than the wavelength of the light absorbed. It should be 
noted that the absorption and emission characteristics of the bound dye can differ from the 
unbound dye. Therefore, when referring to the various wavelength ranges and 
characteristics of the dyes, it is intended to indicate the dyes as employed and not the dye 
which is unconjugated and characterized in an arbitrary solvent. 
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Fluoresces are generally preferred because by irradiating a fluorescer with 
light, one can obtain a plurality of emissions. Thus, a single label can provide for a 
plurality of measurable events. 

Detectable signal can also be provided by chemiluminescent and 
bioluminescent sources. Chemiluminescent sources include a compound which becomes 
electronically excited by a chemical reaction and can then emit light which serves as the 
detectible signal or donates energy to a fluorescent acceptor. A diverse number of families 
of compounds have been found to provide chemiluminescence under a variety or 
conditions. One family of compounds is 2,3-dihydro-l,-4-phthalazinedione. The must 
popular compound is luminol, which is the 5-amino compound. Other members of the 
family include the 5-amino-6,7,8-trimethoxy- and the dimethylamino[ca]benz analog. 
These compounds can be made to luminesce with alkaline hydrogen peroxide or calcium 
hypochlorite and base. Another family of compounds is the 2,4,5-triphenylimidazoles, 
with lophine as the common name for the parent product. Chemiluminescent analogs 
include para-dimethylamino and -methoxy substituents. Chemiluminescence can also be 
obtained with oxalates, usually oxalyl active esters, e.g., p-nitrophenyl and a peroxide, e.g., 
hydrogen peroxide, under basic conditions. Alternatively, luciferins can be used in 
conjunction with luciferase or lucigenins to provide bioluminescence. 

Spin labels are provided by reporter molecules with an unpaired electron 
spin which can be detected by electron spin resonance (ESR) spectroscopy. Exemplary 
spin labels include organic free radicals, transitional metal complexes, particularly 
vanadium, copper, iron, and manganese, and the like. Exemplary spin labels include 

nitroxide free radicals. 

The label may be added to the target (sample) nucleic acid(s) prior to, or 
after the hybridization. So called "direct labels" are detectable labels that are directly 
attached to or incorporated into the target (sample) nucleic acid prior to hybridization. In 
contrast, so called "indirect labels" are joined to the hybrid duplex after hybridization. 
Often, the indirect label is attached to a binding moiety that has been attached to the target 
nucleic acid prior to the hybridization. Thus, for example, the target nucleic acid may be 
biotinylated before the hybridization. After hybridization, an avidin-conjugated 
fluorophore will bind the biotin bearing hybrid duplexes providing a label that is easily 
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detected. For a detailed review of methods of labeling nucleic acids and detecting labeled 
hybridized nucleic acids see Laboratory Techniques in Biochemistry and Molecular 
Biology, Vol 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., 
(1993)). 

5 Fluorescent labels are preferred and easily added during an in vitro 

transcription reaction. In a preferred embodiment, fluorescein labeled UTP and CTP are 
incorporated into the RNA produced in an in vitro transcription reaction as described 
alcove. 

The labels can be attached directly or through a linker moiety. In general, 
10 the site of label or linker-label attachment is not limited to any specific position. For 
□ example, a label may be attached to a nucleoside, nucleotide, or analogue thereof at any 
i;(J position that does not interefere with detection or hybridization as desired. For example, 
% certain Label-ON Reagents from Clontech (Palo Alto, CA) provide for labeling 

interspersed throughout the phosphate backbone of an oligonucleotide and for terminal 

I i j 

IS labeling at the 3' and 5' ends. As shown for example herein, labels can be attached at 
f 1 positions on the ribose ring or the ribose can be modified and even eliminated as desired. 
f} The base mioeties of useful labeling reagents can include those that are naturally occurring 
Ui or modified in a manner that does not interfere with the purpose to which they are put. 
p Modified bases include but are not limited to 7-deaza A and G, 7-deaza-8-aza A and G, 
20 and other heterocyclic moieties. 



it End-labeling nucleic acids. 

In many applications it is useful to directly label nucleic acid samples 
without having to go through an amplification, transcription or other nucleic acid 
25 conversion step. This is especially true for monitoring of mRNA levels where one would 
like to extract total cytoplasmic RNA or poly A+ RNA (mRNA) from cells and hybridize 
this material without any intermediate steps that could skew the original distribution of 
mRNA concentrations. 

In general, end-labeling methods permit the optimization of the size of the 
30 nucleic acid to be labeled. End-labeling methods also decrease the sequence bias 
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sometimes associated with polymerase-facilitated labeling methods. End labeling can be 
performed using terminal transferase (TdT). 

End labeling can also be accomplished by ligating a labeled oligonucleotide 
or analog thereof to the end of a target nucleic acid or probe. Other end-labeling methods 
include the creation of a labeled or unlabeled "tail" for the nucleic acid using ligase or 
terminal transferase, for example. The tailed nucleic acid is then exposed to a labeled 
moiety that will preferentially associate with the tail. The tail and the moiety that 
preferentially associates with the tail can be a polymer such as a nucleic acid, peptide, or 
carbohydrate. The tail and its recognition moiety can be anything that permits recognition 
between the two, and includes molecules having ligand-substrate relationships such as 
haptens, epitopes, antibodies, enzymes and their substrates, and complementary nucleic 

acids and analogs thereof. 

The labels associated with the tail or the tail recognition moiety include 
detectable moieties. When the tail and its recognition moiety are both labeled, the 
respective labels associated with each can themselves have a ligand-substrate relationship. 
The respective labels can also comprise energy transfer reagents such as dyes having 
different spectroscopic characteristics. The energy transfer pair can be chosen to obtain the 
desired combined spectral characteristics. For example, a first dye that absorbs at a 
wavelength shorter than that absorbed by the second dye can, upon absorption at that 
shorter wavelength, transfer energy to the second dye. The second dye then emits 
electromagnetic radiation at a wavelength longer than would have been emitted by the first 
dye alone. Energy transfer reagents can be particularly useful in two-color labeling 
schemes such as those set forth in a copending U.S. patent application, filed December 23, 
1996, Attorney Docket No. 2013.2, and which is a continuation-in-part of USSN 
08/529,1 15, filed September 15, 1995, and Int'l Appln. No. WO 96/14839, filed September 
13, 1996, which is also a continuation-in-part of USSN 08/670,1 18, filed on June 25, 1996, 
which is a division of USSN 08/168,904, filed December 15, 1993, which is a continuation 
of USSN 07/624,1 14, filed December 6, 1990. USSN 07/624,1 14 is a CIP of USSN 
07/362,90 1 , filed June 7, 1 990, incorporated herein by reference. . 

This invention thus provides methods of labeling a nucleic acid and 
reagents useful therefor. Many of the methods disclsoed herein involve end-labeling. 
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Those skilled in the art will appreciate that the invention as disclosed is generally 
applicable in the chemical and molecular-biological arts. 

In one embodiment, the method involves providing a nucleic acid, 
providing a labeled oligonucleotide and enzymatically ligating the oligonucleotide to the 
nucleic acid. Thus, for example, where the nucleic acid is an RNA, a labeled 
riboligonucleotide can be ligated using an RNA ligase. RNA ligase catalyzes the covalent 
joining of single-stranded RNA (or DNA, but the reaction with RNA is more efficient) 
with a 5' phosphate group to the 3'-OH end of another piece of RNA (or DNA). The 
specific requirements for the use of this enzyme are provided in The Enzymes, Volume XV. 
Part B, T4 RNA Ligase, Uhlenbeck and Greensport, pages 31-58; and 5.66-5.69 in 
Sambrook et ai, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, 
Cold Spring Harbor, New York (1982) 

This invention thus provides a method to add a label to the nucleic acid (e.g. 
extracted RNA) directly rather than incorporating labeled nucleotides in a nucleic acid 
polymerization step. This can be accomplished by adding a short labeled oligonucleotide 
to the ends of a single stranded nucleic acid. The method more fully labels a sample; a 
higher percentage of available molecules will be labeled than by conventional techniques. 

RNA can be randomly fragmented with heat in the presence of Mg 2+ . This 
generally produces RNA fragments with 5' OH groups and phosphorylated 3 1 ends. A 
phosphate group is added to the 5' ends of the fragments using standard protocols with T4 
Polynucleotide Kinase, or similar enzyme. To the pool of 5'-phosphorylated RNA 
fragments is added RNA ligase plus a short RNA oligonucleotide with a 3' OH group and a 
label, either at the 5* end (such as fluorescein or other dye, or biotin for later labeling with a 
streptavidin conjugate, or with dioxigenin for later labeling with a labeled antibody) or 
with one or more labeled bases. A riboAe (deoxyribonucleic acid 6 mer poly A) labeled 
with either fluorescein or biotin at the 5' end provides a particularly preferred label. In 
another embodiment, the ligated RNA oligonucleotide can have rioibnucleotides near the 
ligation end, but deoxyrigonucleotides further away. Of course, the RNA oligonucleotide 
can be longer or shorter and can have a virtually any sequence. However, the ligation 
reaction is most efficient with A and least efficient with U at the 3' enS of the acceptor. 
The reaction is allowed to proceed under standard conditions. Unincorporated RNA 6- 
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mers can be removed by a simple size selection step {e.g. electrophoresis, NAP column, 
etc.) if necessary following the ligation reaction. 

An advantage of this procedure is that extracted mRNA can be used directly 
and that each fragment should be labeled once, not any number of times depending on the 
sequence as is the case when labeled bases are incorporated during polymerization 
reactions. 

In another embodiment, fragmented DNA can also be end-labeled using a 
different procedure with a different enzyme. Terminal transferase will add 
deoxynucleoside triphosphates (dNTPs), which can be labeled, to the 3' OH ends of single 
stranded DNA. Single dNTPs can be added if modified nucleotides are used (for example, 
dideoxynucleotide triphosphates), or multiple bases can be added if desired. DNA can be 
fragmented either physically (shearing) or enzymatically (nucleases), or chemically (e.g. 
acid hydrolysis). Following fragmentation, depending on the method, 3' OH ends may 
need to be produced. The DNA fragments are then labeled using labeled dNTPs or 
ddNTPs in the presence of terminal transferase. 

Various other embodiments are illustrated by the Examples provided herein 

and their associated figures. 

C) Modifying Sample to Improve Signal to Noise Ratio. 

The nucleic acid sample may be modified prior to hybridization to the high 
density probe array in order to reduce sample complexity thereby decreasing background 
signal and improving sensitivity of the measurement. In one embodiment, complexity 
reduction for expression monitoring methods is achieved by selective degradation of 
background mRNA. This is accomplished by hybridizing the sample mRNA (e.g., poly A* 
RNA) with a pool of DNA oligonucleotides that hybridize specifically with the regions to 
which the probes in the expression monitoring array specifically hybridize. In a preferred 
embodiment, the pool of oligonucleotides consists of the same probe oligonucleotides as 
found on the high density array. 

The pool of oligonucleotides hybridizes to the sample mRNA forming a 
number of double stranded (hybrid duplex) nucleic acids. The hybridized sample is then 
treated with RNase A, a nuclease that specifically digests single stranded RNA. The 
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RNase A is then inhibited, using a protease and/or commercially available RNase 
inhibitors, and the double stranded nucleic acids are then separated from the digested 
single stranded RNA. This separation may be accomplished in a number of ways well 
known to those of skill in the art including, but not limited to, electrophoresis, and gradient 
centrifugation. However, in a preferred embodiment, the pool of DNA oligonucleotides is 
provided attached to beads forming thereby a nucleic acid affinity column. After digestion 
with the RNase A, the hybridized DNA is removed simply by denaturing (e.g., by adding 
heat or increasing salt) the hybrid duplexes and washing the previously hybridized mRNA 

off in an elution buffer. 

The undigested mRNA fragments which will be hybridized to the probes in 

the high density array or other solid support are then preferably end-labeled with a 
fluorophore attached to an RNA linker using an RNA ligase. This procedure produces a 
labeled sample RNA pool in which the nucleic acids that do not correspond to probes in 
the array are eliminated and thus unavailable to contribute to a background signal, 

Another method of reducing sample complexity involves hybridizing the 
mRNA with deoxyoligonucleotides that hybridize to regions that border on either side the 
regions to which the high density array probes are directed. Treatment with RNAse H 
selectively digests the double stranded (hybrid duplexes) leaving a pool of single-stranded 
mRNA corresponding to the short regions (e.g., 20 mer) that were formerly bounded by the 
deoxyoligonucleotide probes and which correspond to the targets of the high density array 
probes and longer mRNA sequences that correspond to regions between the targets of the 
probes of the high density array. The short RNA fragments are then separated from the 
long fragments (e.g., by electrophoresis), labeled if necessary as described above, and then 
are ready for hybridization with the high density probe array. 

In a third approach, sample complexity reduction involves the selective 
removal of particular (preselected) mRNA messages. In particular, highly expressed 
mRNA messages that are not specifically probed by the probes in the high density array are 
preferably removed. This approach involves hybridizing the poly A + mRNA with an 
oligonucleotide probe that specifically hybridizes to the preselected message close to the 3' 
30 (poly A) end. The probe may be selected to provide high specificity and low cross 

reactivity. Treatment of the hybridized message/probe complex with RNase H digests the 



25 



45 

double stranded region effectively removing the polyA' tail from the rest of the message. 
The sample is then treated with methods that specifically retain or amplify polyA* RNA 
(e.g., an oligo dT column or (dT)n magnetic beads). Such methods will not retain or 
amplify the selected message(s) as they are no longer associated with a polyA* tail. These 
highly expressed messages are effectively removed from the sample providing a sample 
that has reduced background mRNA. 

IV. Hybridization Array Design. 

A) Probe Composition. 

One of skill in the art will appreciate that an enormous number of array 
designs are suitable for the practice of this invention. Generic difference screeing arrays, 
for example may include random, haphazardly selected, or aribtrary probe sets. 
Alternatively, the generic difference screening arrays may include all possible 
oligonucleotides of a particular pre-selected length. Conversely, other expression 
Hs monitoring arrays typically include a number of probes that specifically hybridize to the 

nucleic acid(s) expression of which is to be detected. In a preferred embodiment, the array 
will include one or more control probes. 
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1) Test probes. 

20 In its simplest embodiment, the high density array includes "test probes" 

(also referred to as probe oligonucleotides) more than 5 bases long, preferably more than 
10 bases long, and some more than 40 baes long. In some embodiments, the probes are 
less than 50 bases long. In some cases, these oligonucleotides range from about 5 to about 
45 or 5 to about 50 nucleotides long, more preferably from about 10 to about 40 

25 nucleotides long, and most preferably from about 1 5 to about 40 nucleotides in length. In 
other particularly preferred embodiments the probes are 20 or 25 nucleotides in length. In 
preselected expression monitoring arrays, these probe oligonucleotides have sequences 
complementary to particular subsequences of the genes whose expression they are 
designed to detect. Thus, the test probes are capable of specifically hybridizing to the 

30 target nucleic acid they are to detect. 
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In high density oligonucleotide arrays, designed for generic difference 
screening, the probe oligonucleotides need not be selected to hybridize to particular 
preselected subsequences of genes. To the contrary, preferred generic difference screening 
arrays comprise probe oligonucleotides whose sequences are random, arbitrary, or 
5 haphazard. Alternatively, the probe oligonucleotides may include all possible nucleotides 
of a given length {e.g., all possible 4 mers, all possible 5 mers, all possible 6 mers, all 
possible 7 mers, all possible 8 mers, all possible 9 mers, all possible 10 mers, all possible 
1 1 mers, all possible 12 mers, etc.) 

A random oligonucleotide array is an array in which the pool of nucleotide 
10 sequences of a particular length does not significantly deviate from a pool of nucleotide 
P sequences selected in a random manner (z'.e., blind, unbiased selection) from a collection of 

jjg all possible sequences of that length. 

|;;| Arbitrary or haphazard nucleotide arrays of probe oligonucleotides are 

arrays in which the probe oligonucleotide selection is selected without identifying and/or 
i5 preselecting target nucleic acids. Arbitrary or haphazard nucleotide arrays may 

approximate or even be random, however there in no assurance that they meet a statistical 
J s ^ definition of randomness. 

The arrays may reflect some nucleotide selection based on probe 
composition, and/or non-redundancy of probes, and/or coding sequence bias as described 
20 herein. In a preferred embodiment, however such "biased" probe sets are still not chosen 
to be specific for any particular genes. 

An array comprising all possible oligonucleotides of a particular length 
refers to an array that contains oligonucleotides having sequences corresponding to 
substantially every permutation of a sequence. Thus since the probe oligonucleotides of 
25 this invention preferably include up to 4 bases (A, G, C, T) or (A, G, C, U) or derivatives 
of these bases, an array having all possible nucleotides of length X contains substantially 
4 X different nucleic acids (e.g., 16 different nucleic acids for a 2 mer, 64 different nucleic 
acids for a 3 mer, 65536 different nucleic acids for an 8 mer, etc.). It will be appreciated 
that some small number of sequences may be inadvertently absent from a pool of all 
30 possible nucleotides of a particular length due to synthesis problems, ihadvertent cleavage, 
etc.). Thus, it will be appreciated that an array comprising all possible nucleotides of 
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length X refers to an array having substantially all possible nucleotides of length X. 
Substantially all possible nucleotides of length X includes more than 90%, typically more 
than 95%, preferably more than 98%, more preferably more than 99%, and most preferably 
more than 99.9% of the possible number of different nucleotides. 

The probe oligonucleotides described above can additionally include a 
constant domain. A constant domain being a nucleotide subsequence that is common to 
substantially all of the probe oligonculeotides. Particularly preferred constant domains are 
located at the terminus of the oligonucleotide probe closest to the substrate (i.e., attached to 
the linker/anchor molecule). The constant regions may comprise virtually any sequence. 
However, in one embodiment, the constant regions comprise a sequence or subsequence 
complementary to the sense or antisense strand of a restriction site (a nucleic acid sequence 
recognized by a restriction endonuclease). 

The constant domain can be synthesized de novo on the array. 
Alternatively, the constant region may be prepared in a separate procedure and then 
coupled intact to the array. Since the constant domain can be synthsized separately and 
then the intact constant subsequences coupled to the high density array, the constant 
domain can be virtually any length. Some constant domains range from 3 nucleotides to 
about 500 nucleotides in length, more typically from about 3 nucleotides in length to about 
100 nucleotides in length, most typcically from 3 nucleotides in length to about 50 
nucleotides in length. In particular embodiments, constant domains range from 3 
nucleotides to about 45 nucleotides in length, more preferably from 3 nucleotides in length 
to about 25 nucleotides in length and most preferably from 3 to about 15 or even 10 
nucleotides in length. In other embodiments, preferred constant regions range from about 
5 nucleotides to about 15 nucleotides in length. 

In addition to test probes that bind the target nucleic acid(s) of interest, the 
high density array can contain a number of control probes. The control probes fall into 
three categories referred to herein as 1) Normalization controls; 2) Expression level 
controls; and 3) Mismatch controls. 
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2) formalization controls. ^rtlv 
Normalization «— * - oligonuc.eotide probes fta, are perf c„y 
7T w reference oligonudeotides fta. are added to fte nuc.etc ac,d 
— ^ ^^Tlthe normalize controls after hybridization provide a 

^ trillion conditions, !abe, intensity, "reading" efficiency and 

control for vanattons « hybnd,za,.o ridization „ vary between arrays. In 

— — — ^Lensi^readfrorna.ioU.erprobesin 
. preferred embodiment, stgnals^ ^ ^ ^ probes 
fte array are divided by the signal (e.g., fluoresce 

ftereby normal the meas— ^ ^ u „ 

Vir^ly ~*£Z~ ^ basc .d probe iengft. 

agnized fta. hybndtzauo. « ^ rf ^ otte probes 

Preferred normals probes are oflengths . We 

normalize * omy „ or a fcw 

— -rrrrrjrrr: ti on tatT:: 

Normal v ^ ntro l for spatial variation in hybndization 

comers or edges of fte array as «« as in fte middle. 



3) Expression level controls. 
Expression 



level controls are probes that hybridize specifically with 



. level controls are 
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expected <o decrease. The converse is also true. Thus where the expression .cvels of both 
an expression .eve! control and the target gene appear to both decrease or ,o both increase, 
the change may be attributed to changes in the metabolic activity of the cell as a whole, no, 
to differential expression of the target gene i„ question. Conversely, where the expression 
,evels of the target gene and the expression level control do no. covaxy, the variation tn the 
expression level of the target gene is attributed to differences in regulation of that gene and 
not to overall variations in the metabolic activity of the cell. 

Virtually any constitutively expressed gene provides a suitable target for 
expression .eve. contro., Typically expression leve. contro. probes have sequences 
complement .0 subsequences of constitutive* expressed "houseKeeping genes 
including, but no, limited «o the B-actin gene, the transferrin receptor gene, the 
gene, and the like. 

4) Mismatch controls. 

Mismatch controls may also be provided for the probes to the targe, genes, 
for expression level consols or for normalization controls. Misma.cn controls a« 
oligonucleotide probes identical «, .heir corresponding ,est or contro, probes except for the 
presence of one or more mismatched base, A mismatched base is a base selected so tha, .« 
is no. complement to the corresponding base in the targe, sequence .0 which the probe 
would otherwise specifically hybridize. One or more mismatches are selec.ed such *a, 
under a PP ropria,e hybridization conditions (e.g. smngent conditions) me test or cornel 
probe would be expected ,o hybridize with i,s <arge, sequence, bu< me mismatch probe 
would no. hybridize (or would hybridize to a significantly lesser extent). Preferred 
mismatch probes contain a central mismatch. Thus, for example, where a probe ,s a 20 
mer a corresponding mismatch probe will have the identical sequence except for a smgle 
base mismatch (e.*, substituting a O, a C or aT for an A) a, any of positions 6 through 14 

(the central mismatch). 

In "generic" «r.g., random, arbitrary, haphazard, e,c.) arrays, since the «arge, 

nucleic acid(s) are unknown perfect maun and mismatch probes cannot be priori 
determined, designed, or selected. In mis instance, the probes are preferably provided as 
pairs where each pair of probes differ in one or more preselected nucleotides. Thus, wlule 
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H is no, known „ prior, which of ,he probes in the pair is the perfect match, h is known 
ft. when one probe specifically hybridizes * a par.icu.ar .arge, sequence, the other probe 
of the pair wi.l act as a mismatch control for that targe, sequence. I. will be appreciated 
ft* the perfect match and mismatch probes need not be provided as pairs, bu, may be 
provided as .arger elections (e.g.. 3. 4, 5, or more) of probes ft* differ from each outer 
in particular preselected nucleotides. 

In both expression monitoring and generic difference screening arrays, 
mismatch probes provide a control for non-specific binding or cross-hybridization to a 
nucleic acid in the sample other than the urge, to which the probe is complementary. 
Mismatch probes thus indicate whether a hybridization is specific or not. For example, ,f 
the complementary targe, is present the perfect match probes should be consistently 
brighter man the mismatch probes. In addition, if all central mismatches are present, the 
mismatch probes can be used to detect a mutation. Finally, it was also a discovery of the 
present invention that the difference in intenshy bemeen the perfect match and the 
mismatch probe <I(PMH(MM» Prides a good measure of the concentration of the 
hybridized material. 

5) Sample preparation/amplification/quantUatwn controls. 

The high density array may also include sample preparation/amplification 
control probes. These are probes that are complementary to subsequences of control genes 
selected because they do not normally occur in the nucleic acids of the particular biological 
sample being assayed. Suitable sample preparation/amplification control probes include, 
for example, probes to bacterial genes (e.g., Bio B) where the sample in question is a 

biological from a eukaryote. 

The RNA sample is then spiked with a known amount of the nucleic acid to 
which the sample preparation/amplification control probe is directed before processing. 
Quantification of the hybridization of the sample preparation/amplification control probe 
then provides a measure of alteration in the abundance of the nucleic acids caused by 
processing steps (e.g. PCR, reverse transcription, in vitro transcription, etc.). 

Quantitation controls are similar. Typically they are combined with the 
sample nucleic acid(s) in known amounts prior to hybridization. They are useful to 
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provdie a quanta, ion referee and permi, determination of a standard curve for 
quan.if.ng hybridization amounts (concentrations). 

B) Probe Selection and Optimization. 

i) Generic difference screening arrays 

a) Assumption-free probe selection. 
As explained above, probe oligonculetide se.ec.ion for generic difference 
screening arrays can be random, arbitrary Haphazard, compositin biased, orinciude ai, 
possible oligonculeotides of a part.cu.ar lengti,. Probe choice .s thus essennally 
assumption free, .n some embodiment, however, particular oligonucleotides may be 
exchld from the array or from anaiysis. For exampie, probes ma, conta,„ pahndorrmc 
fences or probe, ma. contain .ong stretches of a.. As, Cs, Gs, Ts, e,c. may be excluded. 

slow that show an unacceptable variation (variation above a particular thresho.d 
Z> , in hybridization — agains. me same sample may be exc.uded (e,<her ,n anay 

a function of me sensitive desired of me assay. T*e more sensitive an assay ,s de,** 
me ,ower tire exc.usion thresho.d is se, ta a preferred embodiment me probe ,s exc.uded 
^en me variation in hybridization intensity exceeds 2 times the background s.gr*. and 

has a relative variation of more man 50%. 

Alternatively such exclusion may be inherent in me selecve .dentifcation 
of differentially hybridizing sequences where me difference between a .es, nucleic ac.d 
sample and a reference nucleic acid sample is compared .o me difference between the 
reference nucleic acid sample and itself. This is described more fu,ly below ,n Section 
IX(B). 

*; Exploitation of codon degeneracy. 
in anomer embodiment, species-specific codon usage can be exploited to 
utilize a longer (and hence more specific and s*b.e) probe without incasing the number 
of probe oligonucleotides necessary to hybridize to al. possible sequence, Ammo acd 
eodons are conserved in the fust and second position of their codons, while the tiurd 
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potion is highly redundan,. Moreover each species or organism favors particular codons 
L encode any particular amino acid. The preferred codon for a part.cu.ar annuo acd m a 
particular species heing .he codon ft* is used a, the highes. frequency for .hat spec.es. 
Codon preferences are we,, known .0 those of ski., in Ore art. They can a.so be read„y 
determined by a simp.e frequency analysis of the nuc.eo.ide sequences of a particular 

organism or species. 

Similarly, the di, tri-, tetra-nucleotide frequency biases of an particular 
orgarusm or species can be used to weigh, the selection of oligonucleotide probes used .n 
■composition biased" generic difference screening array. 

In one preferred embodiment, the probe oligonucleotides are prepared 
having the firs, two nucleotides in each codon being fixed bu. allowing fte uurd nuc.eo.ide 
to vary (eifter by use of a 4 way wobb.e or by fte use of inosine or ofter non-specfically 
hybridizing base). In a preferred embodimen, e*ch codon of me probe wi.l have me 
general formula 

3 ' -X'-X^I-S ' 

where I is inosine or a 4-way wobb.e and X' and X> are A, O, C, TAJ select according .o 
fte preferred codon usage for a particular species. Tnus, for example, an array of 16 mers 
ma. wil! hybridize <o substantially a,, nucleic acids of a particular specie, can be prepared 
where the probes have the formula: _ 

Support-I-X'X'I'-^X'-X^!" -X"X"l"-X"X"X»-3 
with only 4» different probe oligonucleotides. Sui<able codons for mis probe are 

illustrated in Table 1 . . 
Table 1. Preferred sequences for generic coding sequence 16 mer probe oligonucleoudes. 
7n^v,d from standard f lw,l nf amino acid codons (the genetic code).) 
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The affinity of the probes may be further enhanced by the includsion of 
additional intosines, (or 4,-way, 3-way, or 2-way wobbles, or other generic bases) to the 3' 
and 5' ends of the oligonucleotide probes. These codon usage biased probes can be used in 
conjunction with a ligase discrimination to further increase obtainable sequence 
information. Thus, for example, where the hybridization to an array comprising the above- 
described 16 mers also includes a ligation with one or more ligatable oligonucleotides of 
fixed length N, whose sequence is known, each successful ligation provides 16 + N 
nucleotides of sequence information. 



ii) Expression monitoring arrays. 

In a preferred embodiment, oligonucleotide probes in the expression 
monitoring high density array are selected to bind specifically to the nucleic acid target to 

25 which they are directed with minimal non-specific binding or cross-hybridization under the 
particular hybridization conditions utilized. Because the high density arrays of this 
invention can contain in excess of 1,000,000 different probes, it is possible to provide 
every probe of a characteristic length that binds to a particular nucleic acid sequence. 
Thus, for example, the high density array can contain every possible 20 mer sequence 

30 complementary to an IL-2 mRNA. 
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There may exist, however, 20 mer subsequences .hat are no. unique to the 
IL .2 mRNA. Probes directed to these subsequences are expected to cross hybridize „„h 
occurrences of their comp.emenu.ry sequence in other regions of the sample genome. 
Similarly, other probes simply may no. hybridize effectively under the hybridizanon 
5 conditions (,*, due to secondary structure, or interactions with .he substrate or other 

probes) Thus, in a preferred embodiment, the probes that show such poor specficy or 
hybridization efficiency are iden.if.ed and may no, be included either in me high density 
array itself (.*. during fabrication of the array) or in the post-hybridization dam analy,, 
In addition, in a preferred embodiment expression monitonng arrays are 
10 used to identify the presence and expression (Option) .eve. of genes which are 
several hundred base pairs long or longer. For most applications it would be useful ,o 
identify the presence, absence, or expression .evel of several thousand to one hundred 
thousand genes. Because the number of oligonucleotides per array is limited, in a preferred 
embodiment, i, is desired to inc.ude only a limited se, of probes specifc to each gene 
15 whose expression is to be detected. 

a) Hybridization and cross-hybridization data. 
Thus in one embodiment, this invention provides for a metiiod of 
optimizing a probe se, for detection of a par.icu.ar gene. GeneraHy, tins memod involves 
20 providing a high density array conuuning a multiplicity of probes of one or more pari.cu.ar 
,eng,h(s) ma, are complement ,o subsequences of me mRNA transcribed by me «arge, 
gene In one embodiment the high density array may coMain every probe of a pamcular 
lengti, ma, is complementary to a particular mRNA. The probes of me high density array 
are men hybridized whh meir Utfge. nucleic acid a.one and men hybridized whh a h.gh 
25 complex^.highconcenuationnucleicacidsamplema.doesno.conminme^ 

comp.emen.ary <o the probes. Thus, for example, where the .arge, nucleic acid is an RNA. 
m e probes are firs, hybridized whh meir .arge, nucleic acid alone and men hybridized whh 
RNA made from a cDNA library (,g., reverse transcribed polyA* mRNA) where me sense 
6f me hybridized RNA is opposite fta, of me <arge, nucleic acid (to insure that the htgh 
30 complexly sample does no, conuin «arge,s for *e probes). Those pftbes ma, show a 
strong hybridization signal with meir target and little or no cross-hybridization whh me 



55 



high convexity sample are preferred probes for use in ,he high density arrays of .his 

invention. - , 

The high density array may additionally eontain mismatch controls for each 

of me probes to be tested. In a preferred embodiment, the mismatch controls contain a 
centra! mismatch. Where bom me mismatch control and the target probe show high levels 
of hybridization (e.g., me hybridization to the mismatch is nearly equal to or greater than 
the hybridization to the corresponding tes, probe,, the test probe is preferably no, used m 

the high density array. 

In a particularly preferred embodiment, optimal probes are selected 

according to the following method, Firs, as indicated above, an array is provided 
containing a multiplicity of oligonucleotide probes commentary to subsequences of the 
t^e, nucleic acid. The oligonucleotide probes may be of a single lengm or may span a 
variety of lengths. The high density array may contain every probe of a particu.ar length 
ma, is complementary ,0 a particular mRNA or may contain probes selected from vartous 
5| 15 regions of particular mRNAs. For each target-specific probe the array also contatns a 
mismatch control probe; preferably a central mismatch control probe. 

The oligonucleotide array is hybridized to a sample containing (arget 
nucleic acids having subsequences complementary to the oligonucleotide probes and the 
difference in hybridization intensity between each probe and i,s mismalch control is 
JO determined. Only those probes where me difference between me probe and its mismatch 
con«ol exceeds a threshold hybridization intensity i,g. preferably greater man !0% of me 
background signal intensity, more preferably greater man 20% of me background signal 
intensity and most preferably greater man 50% of the background signal intensity) are 
selected. Thus, only probes tha, show a su-ong signal compared «o their mismatch control 
25 are selected. 

The probe optimization procedure can optionally include a second round of 
selection In this selection, the oligonucleotide probe array is hybridized with a nucleic 
acid sample that is not expected to contain sequences complementary to the probes. Thus, 
for example, where the probes are complementary to the RNA sense strand a sample of 
30 antisense RNA is provided. Of course, other samples could be provided such as samples 
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p— , ;;; obeswhercboththeprobcandils _ h co„ tt o, sh ow 

intensity, and most preferably equal = „« if , c binding are selected. Finally, 

seiected. In this way probes tha, show minrmal n^*** 

the array, for subsequent data analysts. Of course, 

b) Heuristic rules. 

Using the hybridization and cross-hybridization data obtained as descnbed 
above graphs can be made of hybridization and cross-hybridizatior, , i».ens*es ^versus 

heoronerueseg numberof As, number of Cs in a window of 8bases, 
various probe properu fc ^ „ ^ 

properties ana uic y hybridization is always 

very strong. If any P** J „ mles metho , 

therefore, not placed on *e chip. This will because 

One set of rules developed for 20 mer probes in this manner ,s the 

following: 

Hybridization rules: 

1) Number of As is less than 9. 

2) Number of Ts is less than 10 and greater than 0. 

3) Maximum run of As, Gs, or Ts is less than 4 bases in a row. 
3 4) Maximum run of any 2 bases is less than leases. 

5) Palindrome score is less than 6. 
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6) Clumping score is less than 6. 

7) Number of As + Number of Ts is less than 14 

8) Number of As + number of Gs is less than 1 5 

With respect to rule number 4, requiring the maximum run of any two bases to be less than 
1 1 bases guarantees that at least three different bases occur within any 12 consecutive 
nucleotides. A palindrome score is the maximum number of complementary bases if the 
oligonucleotide is folded over at a point that maximizes self complementarity. Thus, for 
example a 20 mer that is perfectly self-complementary would have a palindrome score of 
10. A clumping score is the maximum number of three-mers of identical bases in a given 
sequence. Thus, for example, a run of 5 identical bases will produce a clumping score of 3 

(bases 1-3, bases 2-4, and bases 3-5). 

If any probe failed one of these criteria (1-8), the probe was not a member 
of the subset of probes placed on the chip. For example, if a hypothetical probe was 5'- 
AGCTTTTTTC ATGC ATCT AT-3 1 the probe would not be synthesized on the chip 
because it has a run of four or more bases (i.e., run of six). 

The cross hybridization rules developed for 20 mers were as follows: 

1) Number of Cs is less than 8; 

2) Number of Cs in any window of 8 bases is less than 4. 

Thus, if any probe failed any of either the hybridization ruses (1 -8) or the 
cross-hybridization rules (1-2), the probe was not a member of the subset of probes placed 
on the chip. These rules eliminated many of the probes that cross hybridized strongly or 
exhibited low hybridization, and performed moderate job of eliminating weakly 

hybridizing probes. 

These heuristic rules may be implemented by hand calculations, or 
alternatively, they may be implemented in software as is discussed below in Section XII. 

c) Neural net. 

In another embodiment, a neural net can be trained to predict the 
hybridization and cross-hybridization intensities based on the sequence of the probe or on 
other probe properties. The neural net can then be used to pick an arbitrary number of the 
"best" probes. One such neural net was developed for selecting 20-mer probes. This 



• 



58 



10 

a 

ill 

'3 55.* 

m 

Q 

•si 

"'5 15 



20 



25 



30 



neural net was produced a moderate (0.7) correlation between predicted intensity and 
m easured intensity, with a better model for cross hybridization than hybridization. DetaUs 
of this neural net are provided in Example 6. 

d) ANOVA Model 
An analysis of variance (ANOVA) model may be built to model the 
intensitiesbasedonpositionsofconsecutivebasepairs. This is based on the theory that 
the melting energy is based on stacking energies of consecutive bases. The annova model 
was used to find correlation between theaprobe sequence and the hybridization and cross- 
hybridization intensities. The inputs were probe sequences broken down into consecutive 
base pairs. One model was made to predict hybridization, another was made to predict 
cross hybridization. The output was the hybridization or crosshybridization intensity. 

There were 304 (19 * 16) possible inputs, consisting of the 14 possible two 
base combinations, and the 1 9 positions that those combinations could be found in. For 
example, the sequence aggctga... has "ag« in the first position, "gg" in the second position, 

"gc" in the third, "ct" in the fourth and so on. 

The resulting model assigned a component of the output intensity to each of 
the possible inputs, so to estimate the intensity for a given sequence one simply 
adds the intensities for each of it's 19 components. 

e) Pruning (removal) of similar probes. 
One of the causes of poor signals in expression chips is that genes other 
than the ones being monitored have sequences which are very similar to parts of the 
sequences which are being monitored. The easiest way to solve this is to remove probes 
which are similar to more than one gene. Thus, in a preferred embodiment, it is desirable 
to remove (prune) probes that hybridize to transcription products of more than one gene. 

- The simplest pruning method is to line up a proposed probe with all known 
genes for the organism being monitored, then count the number of matching bases. For 
example, given a probe to gene 1 of an organism and gene 2 of an organism as follows: 

probe from gene 1 : aagcgcgat cgat t atgctc 

gene2: itctcggatcgatcggataagcgbgatcgattatgctcggcga 
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matching bases in this alignment, but 20 matching bases in the following alignment: 



, aaqcgcgatcgattatgctc 
probe from gene 1: | | | | I I I I I I I I I I I I I I I I 

gene 2: atctcggatcgatcggataagcgcgatcgattatgctcggcga 

More complicated algorithms also exist, which allow the detection of insertion or deletion 
mismatches. Such sequence alignment algorithms are well known to those of skill in the art 
and include, but are not limited to BLAST, or FASTA, or other gene matching programs 
such as those described above in the definitions section. 

In another variant, where an organism has many different genes which are 
very similar, it is difficult to make a probe set that measures the concentration only one of 
those very similar genes. One can then prune out any probes which are dissimilar, and 
SB make the probe set a probe set for that family of genes. 



J) Synthesis cycle pruning. 
The cost of producing masks for a chip is approximately linearly related to 
the number of synthesis cycles. In a normal set of genes the distribution of the number of 
cycles any probe takes to build approximates a Gausian distribution. Because of this the 
mask cost can normally be reduced by 15% by throwing out about 3 percent of the probes. 
In a preferred embodiment, synthesis cycle pruning simply involves eliminating (not 
including) those probes those probes that require a greater number of synthesis cycles than 
the maximum number of synthesis cycles selected for preparation of the particular subject 
high density oligonucleotide array. Since the typical synthesis of probes follows a regular 
pattern of bases put down (acgtacgtacgt...) counting the number of synthesis steps needed 
to build a probe is easy. The listing shown in Table 1 povides typical code for counting the 
number of synthesis cycles a probe will need. 

Table 1. Typical code for counting synthesis cycles required for the chemical synthesis of 
a probe. "* 

static char base[] = "acgt"; 

//. abcdefghijklmnopqrstuvwxyz 
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staU c short indexr, -■.O.O.UO.0. 0, , 0, 0, 0. 0, 0, 0. 0, 0, 0, 0. 0, 0. 3. 0, 0, 0, 0, 0. 0,; 

errorHwnd( "illegal base"); 
return -1; 

if( strchrC base, aBase ) = NULL ) { 
errorHwndC'non-dnabase ); 

return 0; 

} 

return index[ aBase - a J; 

L short c^c^MtaNumbeKMSyntoisStepsForComplementt char .oca. • buffer )( 

short i, last, current, cycles - 1 ; 

charbufferl[40]; 

for( i =3D 0; bufifer[i] != 0; i++ ){ 

switch( tolower(buffer[i]) ){ 

case 'a 1 : bufferl [i] = 't^break; 

case 'c 1 : bufferl [i] = 'g^break; 

case'g': bufferl [i] = 'c , ;break; 

case'f: bufferl [i] = 'a^break; 

} 

} 

bufferl[i] = 0; 

if( bufferl [0] = 0 ) return 0; 
last = lookuplndex( buffer 1[0] ); 
for( i = 1; bufferl [i] != 0; i++ ){ 

current = lookupIndex( bufferl [i] ); 

if( current <= last ) cycles++; 

last = current; 

return (shortX(cycles -1) • 4 + current +1); 
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g) Combination of selection methods. 
The heuristic rules, neural net and annova model provide ways of pruning 
or reducing me number of probes for monitoring the expression of genes. As these 
methods do no, necessari.y produce me satne resuhs, or produce entity independent 
resuhs, it may be advantageous* combine the methods. For example, probes may be 
pruned or reduced if more than one method (,*. two ou, of three) indict the probe w.11 
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Fig 11 showstenovvofaprocessofincreasing.he number of probes for 
monitoring the expression of genes after me number of probes bas been reduced or pnmed. 
I one embodiment, a user is ab,e ,o specify the number of nucleic acid probes ma. shouU. 
be p,aced on me chip .o monitor the expression of each gene. As discussed above, ,, .s 
advantageous*, reduce probes that wil. no, likely produco good results; however me 
number of probes may be reduced to subs*n«iaUv less man me desired number of probes. 

A, s«p 402, dte number of probes for monitoring mmuple genes ,s reduced 
b y the heuristic rules method, neural net, annova mode,, synthesis cyc.e pruning, or any 
other method, or combination of methods. A gene is selected at step 404. 

A determination is made whether the remaining probes for momtonng the 
selected gene number greater man 80% (which may be varied or user defined) of the 
desired number of probes, .f yes, the computer system proceeds to the next gene a, step 
408 which will generally return to step 404. 

If the remaining probes for monitoring the selected gene do no. number 
neater man 80% of me desired number of probes, a determination is made whemer me 
Lining probes for monitoring the selected gene number greater than 40% (whtch ma, 
he varied or user defined) of the desired number of probes. If yes, an V » appended ,0 the 
end of the gene name ,0 indicate ma. after pruning, the probes were incomplete at step 412. 

At step 414, the number of probes is increased by loosening the consents 
ma, rejected probes. For example, the thresholds in the heuristic ruies may be increased by 
, . Therefore, if previously probes were rejected if they had four As in a row, me ru.e may 

be loosened to five As in a row. 

A determination is then made whether the remaining probes for momtonng 
the selected gene number greater than SQo/o of the desired number of probes at step 416. If 
yes an V is appended to the end of the gene name at step 412 to indicate that the rules 
were loosened to generate the number of synthesized probes for that gene. 

At step 420, a check is made to see if the probes for monitoring the selected 
gene only conflict with one or two other genes. If yes, the full set of probes 
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complementary to the gene (or targe, sequence) a, taken and prune so th a, theprobes 
raining are exactly complementary ,o the selected gene exc,us,ve, y a, step 422. 

A determination is then made whether the remaining probes for monitoring 
the selected gene number greater man 80% of the desired number of probes a, step 424. If 
yes L V is appended to the end of the gene name at step 426 to indicate tha, the only a 
few genes were similar to the selected gene. 

At step 428, the probes for monitoring the selected gene are no, reduced by 
conflicts a, all. A determination is then made whemer the remaining probes for monitonng 
the selected gene number greater man 80% of the desired number of probes a, step 430. 
Js an T is appended ,„ the end of the gene name a, step 432 to indicate tha, the probes 
include the whole family of probes perfectly complementary to the gene. 

If mere are still not 80% of the desired number of probes, an error ts 
reported a, step 434. Any number of error handling procedures may be underudcen. For 
elple, an error message may be generated for the user and the probes for the gene may 
no, be stored. Alternatively, the user may be prompted ,0 en«er a new destred number of 
probes. 

V. Synthesis of High Density Arrays 

Meftods of forming high density arrays of oligonucleotides, peptides and 
other polymer sequence, with a minimal number of synthetic steps are known. The 
oligonucleotide analogue array can be synthesized on a solid substrate by a vanety of 
methods, inc.ud.ng, bu, no. limiKd to, Ught-directed chemical coupling, and mechantcaHy 
directed coupling. See Pirrung „ a/., U.S. Patent No. 5.H3.854 (see also PCT Apphcation 
No WO 90/15070) and Fodor e, 1, PCT Publication No, WO 92/10092 and WO 
93/09668 which disclose methods of forming vast arrays of peptides, oligonucleotides and 
other molecules using, for example, Hght-direeted synthesis technique, See also, Fodor „ 
a, Science, 251, 767-77 (1991). These procedures for synthesis of polymer arrays are 
now referred to as VLSIPS™ procedures. Using the VLSIPS™ approach, one 
heterogenous array of polymers is convened, through simultaneous coupling a, a number 
of reaction sites, into a different heterogenous array. See, U.S. Application Serial Nos. 
07/796,243 and 07/980,523. 
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The devdopmen, of VLSIPS™ technology - described ,„ *e above-no.ed 
N T, ,54 and PCT patent publication No, WO 90/15070 and 92A0092, 
U.S. Paten, No. 5,!43 854 and p symhesis mi screening 

n» of anucleic acid containing a speciftc oligonucleot.de sequence, 
on a glass surface proceeds using automated phosphorite chemistry and chip maskmg 

Itolabile protecting group. Photolysis through a photolithogaphtc mas. - use. 

^ nlpL.ec.ed nucleoside phosphoramidi.es. The phosphoramidi.es react on* «h 
£1 are i..umina.ed (and mus exposed by removal of me pho,olab.,e bloc^g 

Thus, the phosphoramidi.es only add .o those areas sdectively exposed from the 
^clgstep. These^sarerepeatedun.ilmedesiredarrayofseouenceshave een 
"I siL on me solid surface. Combinatorial synmesis of differen. oligonucleotide 

during synmesis and me order of addition of coupling reagents. 

,„ me event that an oligonucleotide analogue with a po.yam.de backbone , 
used in me VLS.PS™ procedure, i, is generally inappropriate to use phosphoramtdrte 
Themis.* to perform the synmetic s,eps, since me monomers do no. anach .o one another 
via a phosphate linkage, msuad, peptide synmetic memods are subs.im«d. See,e. g ., 

Pirrung et al. U.S. Pat. No. 5,143,854. 

Peptide nucleic acids are commercially availabie from, B.osearch, Inc. 
(Bedford, MA) which comprise a poiyamide backbone and me bases found in M mraUy 
occurring nucleosides. Peptide nucleic acids are capable of binding .o „uc.e.c actds wtth 
high specificity, and are considered "oligonucleotide anamgues" for purposes of .Jus 
disclosure. 
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,„ addition to the foregoing, additional methods which can be used to 
generate an array of oligonucleotides on a single substrate are described in co-pend,ng 
:^onsslNo.07 /9 S0, 52 3, fl ,edHoventber 2 0,, W2 ,and0^,4,^ 

November 22, 19,1 and in PCT Pubiication No. WO 93/0966S. .n me methods dts^sed 
"these applications, reagents are deiivered to the substrate b y either („ flowtng wtthm a 
lanne, died on predefined regions or (2, "sponing" on predefined regtons. Howeve, 
other approaches, as we« as combinations of sponmg and flowing, may be employed. ,„ 

other regions when the monomer souths are delivered to the various reaCon sttes 

A epical "flow channel" method applied to the compounds and hbranes of 
the present invention can general* be described as fol.ows. Diverse polymer sequences 
"Uesizedatselectedregionsofa — or solidsuppor, by forming owchannels 
on a surface of the subara* trough which appropriate reagents flow or ,n whtch 
appropriate reagents are placed. For example, assume a monomer A ,s to be hound to 

tne — , a reagent having me monomer A flows through or ,s placed » all or s^ 
me chamteKs). The channels provide fluid contact to the firs, se,ected regtons, *e.b 
binding the monomer A o„ the substrate directly or indirectly (via a spacer) ,n me firs. 

selected regions. . - 

Thereafter, a monomer B is coupled to second selected regtons, some of 
which may be included among the firs, se.ected regions. The second selected regions wl. 
be in fluid contact with a second flow channel(s) through translation, ro«aUon, or 
replacementofdtechannelblockonutesurfaceofmesubs^^ughopenmg 

or Cosing a selected valve; or through deposition of a layer of chemica. 
necessary.astepisperformedforacuvaungatleasturesecondregio^. Thereafter, the 

monomer B is flowed through or place* in the second flow channe.Cs), bindtng monomer B 
a, th. second selected locations. In mis particular example, *e resuhmg sequences bound 
,o me subsuau a, this s,a g e of processing will be, for example, A, B, and AB. The process 
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is rep ea.ed to form a vast array of seances of desired .eng* a, known .ocations on the 

SUbSttate ' After ft. substrate is ac.iva.ed, monomer A can be flowed tough some of 
.ne channels, monomer B can be flowed tough o.her channels, a monomer C can be 
Tol tough still ofter channels, « In .his manner, many or al, of me reaction regton 

be washed and/or reac.iva.ed. By making use of many or a., of the avadable reaction 
regions stauUaneously, me number of washing and ac.iv.ion s«eps « be — _ 
One of ski.1 in me art will recognize Una, mere are alternate mertods of 
forming chapels or otherwise proving a portion of .he surface of the s— For 
Zp e, according .o some embodiment, a pro.ec.ive coa ti ng such as a hW or 
hydropic coating (depending upon Ore nature of me so.ven.) is u. to* over portions of 
fte substta.e » be proved, some.rn.es in combina.i„n wim ma.enals tha, 

toher prevented from passing outide of .heir designated flow parts. 

Adding to other embodiment the channels wiU be formed by deposing 
an elecon or pho.oresis. such as mose used exrensive.y in me semiconductor industry^ 

beam resist such as po.y<o.e„n sulfones, and me tike (more fuUy desenbed tn Chapter .0 
of Ohandi VLSI Fabrication Prices, Wiley (1983)). Adding to these embodtment, 
a resis. is deposited, selectively exposed, and etched, .eaving a portion of me substrate 
exposed for coupling. Tbese s,eps of deposing resis., selectively removing rests, and 
monomer coupling are r=pea,ed .o form polymers of desired seouence a. destred ocation. 
The -spotting" methods of preparing compounds and hbranes of me presen. 
invention can be implemented in muchthe same manner as .he flow channel meftods. For 
example, a monomer A, or a coupled, or dimer, or Timer, or tetramer, e,c, or a fully 
synmeized material, c*n be delivered .o and coupled win. a firs, group of — regrons 
wWch have been appropriately activated. Thereafter, a monomer B can be dehvered u, and 
reacted with a second group of ac.iva.ed reaction regions. Unlike me flow channel 
embodiment described above, recant are delivered by directiy depositing (rafter man 
flowing) relatively small quantities of mem in selected regions. In some steps, of course, 
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ft, entire subsume surface can be spray* or omerwise cared wirh a so.u,.on. n preferre 
embodiment a dispenser moves from region ,o region, depositing only asmuch_r 
as necessary a. each stop. Typicai dispensers include a micropipet to dehver the 
m „„omer so,ution to .he subs^ate and a robotic system ,0 contro. the position of me 
mi cropipe«e with respect ,0 the substrate, .n other embodiments, the otspenser .ncludes a 
series of tubes, a manifo.d, an array of pipettes, or the tike so that various reagents - be 
delivered to the reaction regions simultaneously. 



VI. Hybridization. 

Nucleic acid hybridization simply involves providing a denatured probe and 
.arget nucleic acid under conditions where the probe and its commentary targ« .can 
form stable hybrid duplexes through complementary base pairing. T*. nucletc acds fta 
do no, form hybrid duplexes are then washed away leaving the hybridized nucletc ac,ds to 
be detected, W ieally through de«cuon of an anached detectable label. generally 
agnized that nue.eic acids are denatured by increasing the temperature or decreasmg the 
saU concentration of ft. buffer containing the nucleic acids, or in the addition of chemtcal 
agents, or the rasiing of the P H. Under low stiingency conditions (e.g. , low tempenature 
and/or high sal. and/or high target concentration) hybrid duplexes (e.g., DNA:DNA, 
RNA'RNA, or RNAtDNA) will form even where the annealed sequences are no, perfectly 
complementary. Thus specificity of hybridization is reduced a, lower stringency. 
Conversely, a. higher stringency (e.g.. higher temperature or lower sal.) successful 
hybridization requires fewer mismatches. 

One of skill in the art will appreciate that hybridization conditions may be 
se.ec.ed .o provide any degree of stringency. In a preferred embodiment, hybridization ,s 
performed a. tow stiingency in tins case in 6X SSPE-T a. abou. 40°C «o abou, 50"C 
(0 005% TrUon X-100) ,0 ensure hybridization and men subsequen. washes are performed 
a. higher suingency (e.g., 1 X SSPE-T a. 37'C) ,o eliminate mismatched hybrid duplexes. 
Successive washes may be performed a, increasingly higher stiingency (e.g., down to as 
low as 0 25 X SSPE-T a, 37°C to SOX) until a desired level of hybridization specfioty .s 
obtiuned. Stringency can also be increased by addition of agenti. such as formam.de. 
Hybridization specifichy may be evalua.ed by comparison of hybridization <o me «es< 
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prob es with hybridization to me various co„,ro,s « can be presen, (e.g. , expression level 

, • -I Thus in a preferred embodiment, the wash is performed at the highest 

smngeni.y i- . Thus in a preferred embodiment, the 

aooroximately 10% of the background intensity. Thus, ma pre 

rm alv may be washed a, successively higher stringency soWons and read 

hybndrzed array may be wiU reveal , wash Agency 

between each wash. Analysis of me da* ««» 

above which the hybridization pattern is not appreciably alter 

adequate signal for me particular oligonucleotide probes of interest. 

in apreferred embodiment, background signal is reduced by the use of a 
detergents C-TAB) or a blocking reagent (eg. , sperm DN A, cot- 1 DNA, etc.) during 
detergent (e.g., <~ i™> ™,i c ularlv preferred embodiment, 

the hybridization to reduce non-specific binding. In a particular* P 
rehbridiza.ionispe rf o n nedinmepresenceofabou.O, M abou.O,m^,DNA(,g.^ 

of skill in the art (see, e.g.. Chapter 8 in P. Tijssen, supra.) 

The ability of duplexes formed between RNAs or DNAs are generally ,n 
me order of RNA:RMA > RNA:DNA > DNA:DNA, in so.ution. Long probes have better 
duplex s^bility wim a Uu-get, bu, poorer mismatch dis—ion man shorter probes 
( JismaKh discrimi„a,io„ refers to the measured hybridization signa, ratio betwe» 
perfect match probe and a single base mismatch probe). Shorter probes (e.g.. 8-mers) 
discriminate mismatches very we,,, bu, the overall duplex stability is low. 

Altering the thermal stability (T J of the duplex formed between the target 
and die probe using, e.g., known oligonucleotide analogues allows for optimisation of 
duplex stabiUty and mismatch discrimmation. One usefu, aspect of altenng the T. »- 

• • (» Tl duplexes have a lower T„ than guimine-cytosine 

from the fact that adenine-thymine (A-T) duplexes nave „ 
(O-C) dup.exes, due in par, ,0 the fact that *e A-T duplexes have 2 hydrogen bonds per 
base-pair while «- O-C duplexes have 3 hydrogen bonds per base pair, m heterogeneous 
ol igonuc,eo,ide arrcys in which mere is a non-uniform distribution of bases, it ,s no, 
generaUy possib.e ,o optimize hybridization for each oligonucleotide probe 
Lultaneously. Thus, in some embodiments, it is desirable to selectively destabilize G-C 
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duplexes and/or to increase the stability of A-T dupiexes. This can be accomp .shed, 

%L*** - - p-*- ° f - which fo ™ °- c T '7 tn 

hlpoxanthine, or by substituting adenine residues in probes which form A-T duplexes wttn 

e or by using the sal. tetramethy, ammonium ch.oride (TMAC1 or o th er 
alhylated ammonium salts) in place of NaCl. 

Altered duplex stability conferred by using oligonucieotide analogue probes 
ean be ascertained by foUowing, , g , fluorescence signa. intensity oligonucleotide 
analoguearrayshybridizedwiUta^getoUgonucleotideovertime. ThedataaUow 
optimization of specific hybridization conditions at, room temperature (for stmphfied 
diagnostic applications in the future). 

Another way of verifying altered duplex suability is by followmg the stgnal 
intensity generated upon hybridization with time. Previous experiments using DNA targets 
and DNA chips have shown that signal intensify increases with time, and that the more 
stable duplexes genera* higher signal intensities faster man less stable duplex^ The 
sig^ls reach a plateau „r "saturate" after a certain amount of time due to all of the btndtng 
sites becoming occupied. These da* allow for optimization of hybridization, and 
determination of the best conditions at a specified temperature. 

Methods of optimizing hybridization conditions are well known to those of 
skill in the art isee. e.g.. Moratory Techniaues in Biochemist and Molecular Biology. 
Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)). 

VII. Detection Methods 

Methods for detection depend upon the label selected and are known to 
those of skill in the art. Thus, for example, where a colorimetric label is used, simple 
visualization of the label is sufficient. Where a radioactive labeled probe is used, detection 
of the radiation (*.g with photographic film or a solid state detector) is suffice*. ■ 

As explained above, the use of a fluorescent label is preferred because of rts 
extteme sensitivity and simplicity. Standard procedures are used to determine the 
positions where interactions between a target sequence and a reagent *ke place. For 
example, if a targe, sequence is labeled and exposed to an array of different 
oiigonucleortde probes, only those locations where the oligonucleotides interact w«h the 
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target (sample nucleic acid(s)) will exhibit significant signal. In addition to usmg a label, 
other methods may be used to scan the matrix to determine where interaction takes place. 
The spectrum of interactions can, of course, be determined in a temporal manner by 
repeated scans of interactions which occur at each of a multiplicity of condmons. 
However, instead of testing each individual interaction separately, a multiplied of 
sequence interactions may be simultaneously determined on a matrix. 

B. Scanning System 

In a preferred embodiment, the hybridized array is excited with a light 
source at the excitation wavelength of the particular fluorescent label and the resulting 
fluorescence at the emission wavelength is detected. In a particularly preferred 
embodiment, the excitation light source is a laser appropriate for the excitatxon of the 

fluorescent label. 

Detection of the fluorescence signal preferably utilizes a confocal 
microscope, more preferably a confocal microscope automated with a computer-controlled 
stage to automatically scan the entire high density array. The microscope may be equipped 
with a phototransducer (e.g., a photomultiplier, a solid state array, a ccd camera, etc.) 
attached to an automated data acquisition system to automatically record the fluorescence 
signal produced by hybridization to each oligonucleotide probe on the array. Such 
automated systems are described at length in U.S. Patent No: 5,143,854, PCT Application 
20 92/10092, and copending U.S.S.N. 08/195,889 filed on February 10, 1994. Use of laser 
illumination in conjunction with automated confocal microscopy for signal detection 
permits detection at a resolution of better than about 100 urn, more preferably better than 
about 50 urn, and most preferably better than about 25 urn. 

With the automated detection apparatus, the correlation of specific 
positional labeling is converted to the presence on the target of sequences for which the 
oligonucelotides have specificity of interaction. Thus, the positional information is 
directly converted to a database indicating what sequence interactions have occurred. For 
example, in a nucleic acid hybridization application, the sequences which have interacted 
between the substrate matrix and the target molecule can be directly listed from the 
positional information. A preferred detection system is described in PCT publication no. 
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IICCM n7/ . 94 120 Although the detection described therein is a 
WO90/15070; and U.S.S.N. 07/624,120. Altno g 

, „ Vv> rcnlaced by a spectroscopic or other detector, 

fluorescence detector, the detector can be replaced oy v r 

fluorescence detector relative to a fixed substrate, a 

The scanning system can make use of a moving detector 

toed d^ct!r Xith a moving substrate, or a combination. Alternative, °™* er 
apparants can be used to transfer the signa, direCy to me detector. See. ,,, U.S.S.N. 

07/624 ' 120 ' The detection method wiU typicaUy a.so incorporate some signa. processing 
,o determine whether the signa, at a pabular matrix position is a true positive or tnay be a 
^signal. For example, a signa, from a region which has actua, postuve stgna, may 

regions. Thus, me signa, over the spatial region may be eva.ua.ed ptxe, by ,xel to 
TeLine me .options and the acma, extent of positive .signa,. A true posmve stgna, 
, tl™ to meory.showauniformsigna,a,eachpixe,,ocation.Thus,processmgby 

Z g num Jof pixeis with acn^ signa, intensiry shouid have a Ceariy un.formstgna, 
« Regions where me sign* intensities show a fair.y wide disperse rnay be 
p^cly slpec. and the scanning sysKm may be programmed to more careful* scan 
those positions. ^ ^ ^ ^ ^ ^ ^ „ me ^ 

d—tionofwhetherapositivesignaiexis.ornot. 5e, U.S.S.N. 07,6,4,20 

and discussion below in Section HI. 

VIII. Ligation-Enhanced Signal Detection. 
« A) General Ligation Reaction. 

Ligation reactions can be used to discriminate between fully complementary 
hybrids and those that differ by one or more base pairs, particularly in cases where me 
tmsmatch is near the * terminus of the probe oligonucleotide. Use of aUgaUon reacuon » 
sigrtal deletion increases me stabiiity of the hybrid dupiex, improves>ybndtzauon 
30 specinci V (particu,ar,yforshor.er P robeo,igonuc,eo,ides ef .. 5 to 12 mers), and 
optionally, provides additional sequence information. 



71 



5 



25 



30 



Various components for use of ligation reaction(s) in combination with 
generic difference array, are iUustrated in Figure 13a. In its simpies, embodiment, the 
prob e oligonucleotide/ligation reaction system includes an array of olignucleonde probe, 
As discussed above, the oligonucleotide probes can be randomly selected, haphazard* 
se.ec.ed, composition biased, inc.usive of all possib.e oHgonucleotides of a particu,ar 
len g,h and so forth. The oligonucleotide probes can op.iona.ly include a predetermmed 
•■consul region (see Fig. 1 3a) which has subsun.ia.ly the same sequence for substantial* 
all of the probe oligonucleotides on the array. 

Where the probe comprises a consent region it also preferably composes a 
■•variable region" (see Fig. .3a, which can be random* selected, haphazardly selected 
compositionbiaseo, inclusive of all possible oHgonucleotides of a particular lengtivand so 
forth When constant and variable regions are present, a sample nucleic ac.d tha, 
hybridizes .o the oligonucleotide probe typicaUy hybridizes .o a. !east the variable regton 
and optionally to me consent region as well. 

The probe oligonucleotide/ligation reaction system also optionally tncludes 
a nucleic acid that is complementary to the constat region. This complement may be a 
subsequence of a sample nucleic acid or a separa«e oligonucleotide. When tire complement 
.o the constitn. region is a separate oligonucleotide, hybridization ,0 me consun. regton 
provides a ligation si.e (see Fig. 13a, ligation she A). The hybridized complemen. ,o the 
consent region ean optionally be permanently crosslinked ,0 the consent region by the use 
of cross-linking reagents (e.g., psoralens). The sample nucleic acid, and/or the ligatable 
oligonucleotide can optionally be labeled. Where bo* are labeled, the labels can be the 

same or distinguishable. 

The probe oligonucleotide/ligation reaction system optionally tncludes a 

hgauble otigonouc.eo.ide tha. can be ligated .o free terminus of the variable region (see 
Fig 13a, ligation site B). The ligatable oligonucleotide can be a single oligonculeotide of 
known nucleotide sequence, a collection of nucleic acids of known sequence, or a pool of 
all possible oligonculeotides of a particular length. 

These various components of the probe oligonucleotide/ligation reaction 
system can be combined in a varie* of ways to increases the sti,bili*of the hybrid duplex, 
and/or improve hybridization specificity (particularly for shorter probe oligonucleotides 
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. f 5 » H mers>, and/or provides sequence informal. Vanous uses of*, probe 
„U g onuc,eo.ide,iga.io„ reaction svs,en, are describe, in deiaii be.ov. 

While Figure 1 3a illus.ra.es liga.io„ component m sol.d phase, sun.lar 
approaches and component ean he used in so,u,ion phase. I. wil, he apprecia,=d to, to 

1, will be apprecia«d to. sequences or subsequences of to probe 
oUgonucleoude where variab.e region, are presen. or absen. can ac. as a primer she for 

B) U g aOon Reactions ,0 OtoM- - - ""*« *""** »"* 

TerminL or Both Termini 

in one embodiment a simp.e ligation reason discrimin,ed rn.sma.ches a. 
or near to — otto probe oli g oncu,eo.ide <see Fig. 1 3b). Tvpical.y, to nucleic 
acid fragment comprising to sample nucleic acid are longer ton to probe 
an overhan. When to arrav comprise, probe oligonucleo.des anached u^ough to„ 3 
■ .ermini, to hybridized .arge. (sample, nuc.eic acid provides a y overhang In to 
embodimen. to .arge. nucleic acid is no. necessarily labelled (see. e.g„ F.g. 13b) 

' When to array of oligonuc.eo.ides is combined wi«h to .arge, nuc,e.c ac.d 
.o form <arge.-oKgonucleo.ide hybrid complexes, to urge.-o.igonucleo^e hybnd 
complexes are conned whh a ligase and a labelled, ligaable ohgonuc eC.de or 

^ nucleic acids to ligaUble probes can be performed seouen„,.y m 
embodimen. bo* hybridan and Hgauon are performed —eously (,.*.. to .arge,, 
Uga.ab.eoligonuc.eo.ide.andliga^arealladded.getor). T*e poo. may compnse 

length (e.g., 3 mer up to 12 mer) (see, e.g., Fig. 13b). 
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The ligation reacon of *e labeUed. .iga«able probe, to the phosphorated 
5 . en d of the oiigonucleotide probes on the substrate wil, occur, in the presence of the 
Hgase, p—* when the ^.oligonucleotide hybnd has formed w.,h conec 
bLpLg near .he 9 end of the o.igonucleo.ide probe and where mere , a suttabie 3 

Fi6 ,2) After the Ugation reason, me substfate is washed (multiple times ,f necessary) 
under conditions suitable to remove the targe, nucleic acid and the .abeled unhgated 
probes (. g.. above 40°C to 50"C, or under otherwise highiy stringent condtttons). 

Thereafter, a fluorescence image (,«.. a quan.im.ive fluorescent tmage) of 
me hybridan panem is obtained as described above in Section V II( B, Labeled 
oligonuc.eo.ide probes, the o.igonucleotide probes which are complementary to the 
target nucieic acid, are identified. The presence, absence, and/or intensity of ore 
hybridization signa. provides information regarding the presence and level of the nucletc 
acid sequence or subsequence in the nucleic acid sample as described above 

Any enzyme that catalyzes the formation of a phosphodiester bond a, the 
site of a single-stranded bre*k in duplex DNA can be used to enhance discrimmauon 
between fully complementary hybrids and those ma. differ by one or more b^e parrs. 
Such Hgases include, but are no. limited to, T4 DNA ligase, ligases isolated from E. 
and Hgase* isoUted from outer bacteria and bacteriophages. Tta concentration of the 
Hgase will vary depending on me particular Hgase used, the concentration of urge, and 
buffer conditions, bu, wil, typical., range from about 50 units/ml to about 5,000 untts/ml. 
Moreover, me time in which the array of ^oligonucleotide hybridization comp.exes ts 
in contact with the Hgase will vary. Typically, the ligase treatment is earned out for a 
period of time ranging from minutes to hours. Memods of performing Hgase 
, 5 discrimination can be found in copending USSN 08/533,582, Sled on October 18, 1995 
and in Jackson el al. (1996) Nature Biotechnology, 14: 1685-1691 . 

It will be appreciated that the method described above pnmanly 
descrimina^snusn^tchesatorneaxmeyterminusofmesurfaceboundprobe 

oHgonucleot.de and does little to discriminate mismatches at, or near, the 5' termmus of .he 

30 target (sample) nucleic acid {see Fig. 1 3b). 
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In another embodiment, a ligation can be used to dominate mismatches 
a, or near, the end of the sample nucleic acid (Fig. 13c). In this instance, the probe 
oligonucleotides comprise a constant region and a variable region (*.*.. the vanab.e 
regions can include all possible 8 mers as illustrated in Fig. 13c). A constant 
oligonucleotide (complementary to the constant region or a subsequence thereof, ,s 
hybridized to the consent region and cross-linked (, g .. eovalently bound) a. that locatton. 
The remainder of the probe oligonucleotide (e.g., the variable region or subsequences 
thereof and optionally a subsequence of the constant region) forms a S overhang to whtch 
the nucleic aeid sample can hybridize. Where there are no mismatches a. or near the 
terminus of the sample oligonucleotide, a ligation even, then joins the sample 
oligonucleotide to the constant oUgonuc.eo.ide. Free nucleic acids are washed away 
■caving bound hybridized sample oligonucleotides which can men be detected. 

In still another embodiment, , a double ligation (illustrated in Fig. 13d) can 
be used to discriminate mismatches a. or near Ac ends of bout the probe oligonucleotide 
and dte target nucleic acid. In this approach, the probe oligonucleotides each compnse a 
consent region a,d a variable region as described above in V„.(A). The surface bound 
oligonucleotide probes are hybridized to a constant oligonucleotide having a sequence 
which is complementary to the cons.*,, region of the oligonucleotide probe, The sample 
(target) nucleic acids are contacted to the hybrid duplex in the presence of a ligase. Where 
mere is no terminal mismatch between the sample nucleic acid and the variable regton, the 
Ugation is successful resulting in the ligation of the constant oligonucleotide to the sample 
nucleic acid (see "firs, ligation" in Fig. 13d). This ligation thus discriminates mismatches 
at the terminus of the sample nucleic acid. 

The hybridized duplex is contacted with a pool of labeled ligatable 
oligonucleotide, Where a ligatable probe is complementary to the overhange produced by 
the hybridized sample nucleic acid and mere arc no mismatches at or near the free temunus 
of the variable region of the probe oligonucleotide a second ligation will atach the labeled 
ligatable probed Fig. 13d). The second ligation thus discriminates against mtsmatches 
near the free terminus of the probe oligonucleotide. I. will be appreciated that the vanous 
hybridization and ligation reactions may be carried ou, sequentially of simultaneously, and 
in a preferred embodiment are carried out simultaneously. 
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. As wim <he previously described method, any enzyme .hat cataiyzes .he 
formation of a phosphodiester bond a, the si,e of a singie-strand break in duplex DNA can 

^.oenicedi— ::r D r 

ligaS e iigases isolated from E. coU and Hgases iso.ated from other bactena or 
iTrioLes The concentration of the ligase will vary depending on the particular 
Itoasl^u^ed^he concentration of target and buffer conditions, but will typically range from 
1:^,0 ahoutS.OOOuni^m,. Moreover, the time in which .he array of targe, 

Z TypicaUy, the Ugase treatment U cried ou, for a period of time rangmg from from 
lutesTo hoi in addition, it wil, be readi,y apparent to .hose of *DI - *e «wo 

single ration mix mat con.ains: target oligonucleoudes; — ohgonucieohdes, a 
P-nabe,ed == ir^e.stUga.ion.acUongener.yoccu.o.y 

region of the oligonucleotide probe. Sinularly, the secona g 

,al, to me probe, generaUy occurs efficiently oniy if me firs, Hgation react™ w^ 

successful and if me ligat* targe, is complement to the ? end of me probe. Thus, tos 

is advantageous in that i, allows a shorter variabie probe region to be used; mcreases 
probe targe, specificity and removes the necessity of labeling the Urge, Dual hga»on 
:todsofIsor,a re de 8 c ri bedinde^i„copending U SSN0 8 / 5 33, 5 S 2 , f .ledon 

October 18, 1995. 

to another embodiment, after hybridization of the nucleot.de 
complement to the — region of me probe oUgonculeoUdes, the hybrid duplex 

i, is also permanently attached to me soUd support. In mis embodiment, me use of a 
^ieoligonucleotideisoptional. The samp, nucleic acid may i^f be .abeledmereby 

permitting detection of the ligated sample nucleic acids. 
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Memods for cross-linking nucleic acids are we., known «o .hose of skil. in 
th e an. Such memods indude. bu. are no. limited .0. baking, exposure .o UV, exposure ,o 
ionizing radiauon, and comae, win. chemica, cross-Unking reagen,s. .n a pan,cular.y 
preferred embodimen, cross-linking is accompHshed by d. formation of cova.cn bonds 
wim chemical cross-linking reagen., Preferred cross-linking reagems include b.funcona 
cross-linking reagen.s and cross-linking is accomplished by chemical or pho.oac..va..o» of 
me cross-linking reagen. wim Ore nucleic acids. The reagems may be apphed after 
formation hybrid duplexes, bu. in a preferred embodimen, the cross- inker , .m^ly 
anached to either d. probe or complement (.o me consun. regron) nuc,e,c acds before 
hybridization. 

The cross-linking reagen. can be any bifunctional molecule wh,ch 
covalendy cross-links me tester nucleic acid .o a hybridized driver nucleic acid. Generally 
me cross-linking agen. wil. be a bifcncuonal pho.oreagen. which wiU be monoadducted ,o 
me .es.er or driver nuc.eic acids .eaving a second photochemical* reachve res.due wh,ch 
ean bind cova.en.ly <o me corresponding hybridized nucleic acid upon pho.oexciu>t.on. 
Xhe cross-.inking mo.ecule may also be a mixed chemica, and pho.ochemical b,func„onal 
reagen. which will be „on-pho.ochemical,y bound ,o ft. probe or tester nucle.c ac.ds vra a 
chlical reaction such as alkyUtion, condensation, or addition, followed by pho.oc cm.ca. 
binding .o me corresponding hybridized nudeic acid. Bifuncnona. chemica. cross-lmkmg 
molecules ac,iva<ed eimer ca.aly.ically or by high temperature foUowing hybridizaUon 

may also be employed. 

Examples of Afunctional photoreagents include furocoumanns, 
benzodipyrones, and bis azides such as bis-azido emidium bromide. Examples of mixed 
Afunctional reagents with both chemical and photochemical binding moieties include 
haloalM-^ocoumarins, haloalkyl benzodipyrones, haloalkyl-courmarins and vanous 

azido nucleoside triphosphates. 

Particularly preferred cross-linkers include linear furocoumanns (psoralens) 
such as 8-methoxypsoralin, 5-memoxypsora.in and 4, 5', 8-trimethv.psorahn, and the .dee. 
Cher sunab.e cross-Unkers inCude cis-benzodipyrone and trans-benzodipyrone. The 
cross-linker known commercial* as Sorlon is a,so suable. For a detailed descr.pt.on of 
me cross-linking of hybridized nucleic acids see WO 85/02628. 
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The foregoing enhancement discrimination medtods invoking the use of 
ligat ion reactions can be used in all instances where unproved ~ ^ 

complementary hybrids and those that differ by one or more base pa,rs would be helpful. 
Z Panicky, such methods can be used ,o more accuracy de,ermine the sequent 

nuc, ic acid (i.e., such methods can be used in conjunction wth a second sequenctng 

L restrict the way in which an array of .^oligonucleotide hybnd complexes 
not sui , , ^...ftodedligaubleprobestoimprovehybridizafon 

treated with a ligase and a pool ot labeled, uga f 

signals on high density oligonucleotide arrays. 

B) Ligation Reaction to Add Sequence Information. 

i) Extended sequence information from simple ligation. 
The ligation reactions described above can also be used to increase the 
sec.uenceinformadonobuinedregardinga.ehybridi.ednucleicacid. Itwdlbe 

Z the hybridized sample nucleic acid has a sequence or subsequence complementary to 
tie hybridized probe o.igonuc.eotide. Thus a hybridization even, provides sequence 
information tha, can be used „ identify Ore nucleic acids (e.g., gene tiansenp.) present tn 

by I lengti, of the probe o.igonudeotide. Thus, where the probe oHgonuCeot.de ,s an 8 
mer, 8 nucleotides of sequence information is obtained. 

However, the ligation discrimination reactions described above can be used 
t0 pr „vide additional sequence information. ,n this embodiment, rati*r man every possrble 
Hgauble oligonucleotide of a given lengti, the array and sample nucleic acids are 
hybridized to predetermined .igatable oligonucleotides in which the nucleotides at one or 
more positions are Known. Successful hybridization and ligation of tire label 
oligonucleotide thus indicates ma, tire hybridized sample nucleic acid has nucleotides 
complementary ,o the liable oligonuc.eotide in addition to tire probe oUgonucleotide. 
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Thus for example, where the probe oligonuc.eo.ide is an 8 mer and specific 
. mer ligatable probes are used, the resuming hybridan wil, provide .4 nucleotides 

worth of sequence information. 

Where different ligatable oligonucleotides are used in thts context, ,, ts 
desirable ro distinguish between the various Hgated oligonucleotides This can be 
Inplished by sequential Unions with each different sn*cies of hgatable probe 
ZZ by reading of Ore array. Alternatively, each species of ligatable ohgo„uc,eo,.de 
can be labe.ed with a different detection labe. allowing simultaneous ligatton and 
subsequent detection of the various different labels. 

ii) Use of a generic ligation GeneCHipfor interrogating sequences 
accent ,0 restriction sites in a complex (targe,) sample nucleic acUL 
The generic difference amays can be used to fingerprint comp.ex DNA 
ciones or ,0 monitor the complex pattern of gene expression from a given source. In 
fingerprinting a nucleic acid sentence (e.g. an * bp sequence) adjacent to a gtven 

action enzyme site is „ ^ which cleaves the target at a 

In fingerprinting, a restriction enzyme is ^ 
frequency dependent on the length of the recognition sequence. The restriction digestthus 
generate nucleic acid fragments approximately uniformly distributed along the genomtc 
DNA F „rinstance,a4-cu«erU k eHsp92.Iwou.dcu.a ra rge.abou,on«everyseveral 

' nundered basepairs, whereas a 6-cutter, like Sac. would cu, a target about once every 
several thousand (4,000) basepairs. WiU, resfriction enzyme fragments, the mdtvtdua. 
fragments are typically non-overlapping and average several thousand basepatn , » lengu, 
For me purposes of fingerprinting, with a 6-cuner restriction enzyme i, is possible » 
examine (2000-3000 fragments X 4000 bases/fragment - 8-12 million basepatrs per urge,. 
This indicates that i, is possible ,0 routinely sort an 8-12 million basepair target in a htgh 
density a^ray to measure expression differences or to monitor gene expression (see, e.g.. 
Fig 14c) thereby providing a characteristic expression ••fingerprint- or abundance 
difference fingerprint for each restriction digest of the satnple nucleic actd. The 
fingerprinting methods thus provide means to subsample a nucleic acH population » a 
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^»„,.--^»«-'~ i "'""" i " , "" l "" d '" 

region as described above. restriction 

.cognition sue («, e.g., F**- > ^ ^ 

shortened by the appn>nate number "^■ J' However> in . 

C^^»--•-'-' 4 -■•■-'-* ,,,a ■ ,, *"" 

i r> i r>r ? bases at the 5' end provide a 
e^e. Preferred restriction enzymes ieavng only 0,1, or 2 bas« aUh P 

leave the same recognition base at the 5 end. or 

Znm Apal Kpn.,B a n n ,a 1 ne a vejus,aCa,*e5'endma k ing*eseco m pa»b.e 
r!^ rIsIL enzymes and their characteristic recognition/cleavage sites are »e.l 

"^^•^diges^^etismenbybrtdizedandiiga^d^enighdensi^ array, 

preferably in me presence of a complement to me consun. region, using standard 

^ 30°C c/n 800 UT4 1 igase,T4Ugase buffer). The hybrid,za,.on m effect 

conditions (e.g., M ^» ° /n » ovv & u M „i~ niir iei 

sor* (iocates and/or iodizes, me samp,e nucieic acids the pos.t,on of me sampl nuc* 

being defined by me seance of the bases adjacent » the reanCon « S 
1 L hybridization da. can be used direct,, in an expression monitonng memod as 

) acids for generic difference screening. 
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In a pretOT =d embodiment, one of two fonnats a. used. .n Fom.aU, the 
„ , „ the sample nucleic acid and, optionally, the complement ,0 the 

consmtregton) .slocked P ralen) . The complementary 

cross-linking) to me complement <** by *e us P ^ 

, N NaOH). These ctoss-linked fragments can *en b 

" ! ■ „ A me DN A is reaction digested as described above, a*d men 

a deoxynucletc actd sample, me DNA » ^ 
directly hybridi^igated ,„ me genenc deference array 

discussion above and Fig. 14d). In the case of fonnat I assays the g 
I rab l y not labeled andinstead, serves as a hybHdization probe in a second round of 
25 hybridization of labeled sample nucleic acids to the high density array 

enzy mes do not ligate to thehigh density aray, airline phsophatase can be used to treat the 
sample nucleic acids before restriction enzyme digesUon. 
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. The principle behind differentia! display is to genera, a set of randomly 
primed ^pHfication (.*. PCR, — rs fro. a firs, strand cDNA populate 
ascribed from RNA using anchor primers of me form: 

(T) VA, (T)„VG, (T).VC and (T)„VT 
in which V is A, G, or C, and n ranges from about 6 to about 30, prefer* from about S to 
Lt 2 0 and more prefer* about .0 ,o about ,6 with n=14 being most preferred. 

^Zm sets o, cDNA transcripts are represented in a particuiar nucietc acd fragment se, 

^ememodisiUusrratedinFiguresieamrough.ee. First strand cDNA is 

synthesized by reverse transcript* of po.y(A) mRNA using an anchored po.y(t) pnmer 
synthesizea y The first strand DNA acts as a template for 

9 M*irdine to standard methods (Fig. 1 6a). 1 ne nrsi nr 

a g , ., pcro ^ upstream primers comprising an engineered restncfon 
a m P lifica, 1 on(,g,v,aPCR)us 1 ngups P Randomly primed PCR 

she and one or more degenerate bases (N-A,C,O.U at tne 

( . g air primers (T^VA, CI),.VG, CD..VC, CD„VT and random pnmer e g^ad 
i,!- ,-CATGAGCTCNN). The resulting amplification fragments are men dtgestrf w«h a 

sample nucleic acids are men hybridized » a generic difference screenmg array as 
described above^ ^ ^ ^ ^ ^ ^ ^ ^ 

thereby allowing use of the generic difference screening methods of this '—In one 
emboLent,theprobeo,igonucleo^^ 

^g resuiction site on the sample nucleic acids if present. The _g analysts 

proceeds as described above. 

The method allows analysis of several thousand or even more bands 
(nucleic acids) simultaneously, furihermore, sequence information is a!so provided on the 
differentially abundant nucleic acid. For example where the cleavage ,s w«h Sac 
providing a 9 base tail (CATGAGGTC) the array can comprise probe ohgonucleottdes 
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haveing a commentary 9 base constant region and variable regions comprising al. 
possible 9 mers. This provides 17 nuclides of sequence information for each 
hybridization (9 mer constant + 8 mer variable). 

iv) Use of ligation to extract additional sequence information front 
restriction selected nucleic acid hybridizations. 

Ligation reactions can also be used in combination with restriction dtgests 
,„ subsample the sample nucleic acids a. approximately uniform intervals and 
simultaneously provide additional sequence information using a Hgatron reaction. In tins 

a nucleic acid sequence complement to tire sense or antisense strand of auction s,te 
isee eg Fig. 14). The sample nucleic acids are digested randomly with a DNAse or 
Zciflyla res,rictione„donuc,ease<e.s. Sau3A, 

I then hybridize ,0 the high density array. Only those nucleic acids havmg terrmnt 
complementary to me constant regions .win bind to the probe oligonucleotides. Thus, the 
restriction fragments will be preferentially selected. 

The array is also hybridized with a pool of ligatable oligonucleotides 
comprising all possible ohgonucleotides of a particular lengtit (*.*.. a 6 mer) in the 
presence of a ligase thereby Hgating the complementary ligatable oltgonucleotides to the 
Linus of the probe oligonudeotide. This produces probe oligonudeotides mcreased m 
lengtir by the leng* of tire ligatable oligonucleotide and complementary to nucletc acds 
known to be present in the nucleic acid sample. 

The DNA is then stripped off of the array and the elongated probes are used 
to perform generic difference screening of the nucleic acid samples as described above. 
When probes corresponding to nucleic acid differentially expressed in the vanous samples 
are identifted, Ore known probe sequence can be use4 to identify tire nucleic acids that are 
differentially expressed in the samples. 

In one embodiment, this is accomplished by producing 4 pnmer 
oligonucleotides comprising the — region plus the known variable region and an 
actional nucleotide (A, O, C, or T) on one end. The genomic clone * men dtgested w,th 
a second restiiction enzyme and ligated to an adaptor sequence. Using the 4 pnmer 
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olgionuc.eotides and the adapter sequence as primers the genomic sequence of tnterest can 
b Lo fied (. g using PCR, from the genomic done, The PCR amplfiied sequence can 

° fin,ereSt ' For example, in one embodiment, a 10 mer high denity array is designed so 
tha, i, comprises a,! possible combination of .0 mer oiigonucieotides 4-.««7. 
nucleic acids) and, a. the beginning of each oligonucleotide, a consent sequence <c. g 
""-TAOT-5% the first 4 bases of which are complementary to the recogntfon sequence of a 

restriction enzyme {e.g., Sau 3A plus one base T). 

Complete digestion of a Urge genomic done or a sunphfied cDNA horary 

( . , a cDNA library that only includes parts of the * end or ,- end «^ 
oi example, a 4 cuner enzyme (Ulustrated herein by Sau 3A) generates D N A fragment, 
1 a 5 overhang sequence (for Sau 3A, the overhange is OATC). The recognmon stte 
exists at approximately every 500 bp. ...... 

When me DN A fragments are hybridized with the 1 0 mer cmp m the 
presence of all possible combinations of a ligatable oligonudeotide of a pariicmar leng* 
£T. * -) 1 a T4 DNA Hgase ,the ligatable oligonudeotide is Hgated onto the probe 
oligonucletide.^ ^ ^ ^ ^ ^ ^ ^ ^ 

is performed as described above. This permits identification of probe olgioonculeotides 
ma, hyridize to nucleic acids that are present a. different levels in the tested * 

Based on the 14 bp sequence in mis example (5 mer consent regton bases 
p.us .0 mers) from me probes of interest in the array, four 16 base primers are produced by 
adding one base (A. G, C, or T> a, the end. Using these primers and adaptor sequences as 
primers, Ore genomic sequence of interest can be amplified. The amplified sequence^ 
In be used » probe a cDNA library to obtain the whole cDNA of interest as descnbrf 



above. 



IX. Signal Evaluation. 

A) Signal Evaluation for expression monitoring. 
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One of skill in the an will appreciate that methods for evaluating the 
Motion results vary with the nature of the specific probe nucleic acids used as we., as 
the contiols provided. In the simp.es, embodiment, simp.e q uan,ificauo„ of the 
fluorescence intensity for each probe is determined. This is accompiished s.mp y by 
measuring probe si E na, streng* a, each iocation (representing a different probe, on tire 

fllescence (intense, produced by a fixed excision iUumination a, each ocation on «. 
^y, Comparison of the absoiute intensities of an array hybridized to nuc.etc acds *om 
a Z- sample with intensities produced by a "contro." samp.e provides a measure of the 
relative abundance of the nucleic acids that hybridize to each of the probes. 

One of skill in the art, however, will appreciate that hybridization stgnals 
wiU vary in strength with efficiency of hybridization, the amount of labe, on the sample 
nucleoid and the amount of the par.icu.ar nucleic acid in the sample. Typtcally nuc.e.c 
acids present a, very low ,eve.s (,,., < IpM) will show a very weak signal. At some low 
,eve, of concentration, the signa. becomes virtually i„disti»guishab.e from background In 
evaluating the hybridization data, a threshold intensity value may be selected be.ow wmch 
a signal is no. counted as being essentiaUy indistinguishable from background 

Where it is desirable to detect nucleic acids expressed at lower levels, a 
l0W er threshold is chosen. Conversely, where only high expression levels are to be 
evaluated a higher threshold level is selected. In a preferred embodiment, a suttable 
mresholdisaboutlO'/oabovematoftheaveragebackgroundstgnal. 

!„ addition, the provision of appropriate controls permits a more detailed 
aM lysis that controls for variations in hybridization conditions, cel. health, non-specific 
binding and the like. Thus, for example, in a preferred embodiment, the hybridization 
anay is provided with normalization controls as described above in Section .V(A X 2). 
These normalization controls are probes complementary to control sequences added m a 
known concentration to the sample. Where the overall hybridization conditions are poor, 
me normalization controls wiU show a smaller signal reflecting reduced hybridization. 
Conversely, where hybridization conditions are good me normalization controls w.11 
provide a higher signal reflecting the improved hybridization. Normalization of the stgnal 
derived from outer probes in the array to the normaHzation controls thus provides a control 
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for variations in array synthesis or in hybridization conditions. Typically, normalize „ 
accomplished by dividing the measured signal from the other probes in the array by the 
average signal produced by the normalization controls. Normalization may also include 
correction for variations due to sample preparation and amplification. Such normalize 
may be accomplished by dividing the measured signal by the average signal from the 
sample preparation/amplfication control probes the BioB probes). The resulting 
values may be multiplied by a constant value to scale the results. 

As indicated above, the high density array can include mismatch controls or, 
in the case of generic difference screening arrays, pairs of related oligonucleotie probes 
differing in one or more preselected nucleotides. In preferred expression momtonng 
arrays there is a mismatch control having a central mismatch for every probe (except the 
normalization controls) in the array. It is expected that after washing in strmgent 
conditions, where a perfect match would be expected to hybridize to the probe, but not to 
the mismatch, the signal from the mismatch controls should primarily reflect non-spectfic 
binding or the presence in the sample of a nucleic acid that hybridizes with the mxsmatch. 
In expression monitoring analyses, where both the probe in question and its correspond 
mismatch control both show high signals, or the mismatch shows a higher signal than tts 
corresponding test probe, the signal from those probes is preferably ignored. The 
difference in hybridization signal intensity between the target specific probe and its 
corresponding mismatch control is a measure of the discrimination of the target-spectfic 
probe Thus, in a preferred embodiment, the signal of the mismatch probe is subtracted 
from the signal from its corresponding test probe to provide a measure of the signal due to 
specific binding of the test probe. Similar, as discussed below, in generic difference 
screening, the difference between probe pairs is calculated. 

The concentration of a particular sequence can then be determined by 
measuring the signal intensity of each of the probes that bind specifically to that nucleic 
acid and normalizing to the normalization controls. Where the signal from the probes is 
greater than the mismatch, the mismatch is subtracted. Where the mismatch intensxty is 
equal to or greater than its corresponding test probe, the signal is ignored. The expresston 
level of a particular gene can then be scored by the number of positive^ signals (either 
absolute or above a threshold value), the intensity of the positive signals (either absolute or 



86 

above a selected threshold value), or a combination of both metrics (e.g., a weighted 
average). 

It is a surprising discovery of this invention, that normalization controls are 
often unnecessary for useful quantification of a hybridization signal. Thus, where optimal 
probes have been identified in the two step selection process as described above, in Section 
IV(B)(ii)(a), the average hybridization signal produced by the selected optimal probes 
provides a good quantified measure of the concentration of hybridized nucleic acxd. 

B) Signal evaluation for generic difference screening. 

Signal evaluation for generic difference screening is performed in 
essentially the same manner as expression monitoring described above. However, data is 
evaluated on a probe-by-probe basis rather than a gene by gene basis. 

In a preferred embodiment, for each probe oligonucleotide the signal 
intensity difference between the members of each probe pair (K) is calculated as : 

where X is the hybridization intensity of the probe, i indicates which sample (in this case 
sample 1 or 2), and j indicates which replicate for each sample (in the case of Example 7 
where there were two replicates for each nucleic acid sample, j is 1 or 2), K is the probe 
pair ID number (in the case of Example 7, 1 . . . 34,320), and 1 indicates one member of the 
probe pair, while 2 indicates the other member of the probe pair. 

The differences between the signal intensity difference for each probe pair 
between the replicates for each sample is then calculated. Thus, for example, the 
differences between replicate 1 and 2 of sample 1 (e.g. a normal the normal cell line) and 
between replicate 1 and replicate 2 of sample 2 (e.g., athe tumor cell line) for each probe is 
calculated as 

(X llkl -X llk2 > ~ ( X 12kl _X 12k2) 

for k- 1 to the total number of probes. 

The replicates can be normalized to each other as: 

<X llkl -X llk2 ) / <X 12kl -X 12k2 ) for sample 1 or 
(X 21kl -X 21k2 ) - (X 22kl -X 22k2 ) for sample 2 
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for all probe pairs (i.e., after normalization, the average ratio should approximate 1). 

Finally, the the differences between sample 1 and 2 averaged over the two 
replicates is calculated. This value is calculated as 

( (X 21kl +X 22k2 > /2) - ( <X llkl +X 12k2 > /2) 
5 after normalization between the two samples based on the average ratio of 

[ /2 ] / [ (X llkl +X 12k2 ) /2 ] . 

This data is plotted as a function of probe number (ID) and probes having differentially 
hybridized nucleic acids are readily discernable (see, e.g. , Fig. 1 6c). 

However, the data may also be filtered to reduce background signal. In this 
10 instance, after normalization between replicates (see above), the ratio is calculated as 
follows: If the absolute value of (X llk ,-X llk2 )/(X 12k ,-X 12k2 ) > 1, then the 
ratio=(X llkl -X uk2 )/(X 12kl -X I2k2 ) else the ratio= (X I2kl -X 12k2 )/(X nkl -X Mk2 ) (the inverse). 

The ratio of replicate 1 and 2 of sample 2 for the difference of each 
oligonucleotide pair, is calculated in the same way, but based on the absolute value of 

(X 22 j t i - X 22k . 2 ) / (x 21kl -x 211t2 ) . 
Finally, as above, the ratio of sample 1 and sample 2 averaged over two replicates for the 
difference of each oligonucleotide pains calculated as in Fig. 17a, but based on the 

absolute value of 

20 [ (X 21kl +X 22k2 ) /2 ] / [ (X llkl +X 12k2 ) /2 ] and 

[ (X llkl +X 12k2 ) /2] / [ (Xaua+X^) /2] 

after normalization as described above. 

The oligonucleotide pairs that show the greatest differential hybridization 
between the two samples can be identified by sorting the observed hybridization ratio and 
25 difference values. The oligonucleotides that show the largest change (increase or decrease) 
can be readily seen from the ratio plot (see, e.g. Fig. 17c). 

X. Identification of Gene Whose Expression Is Altered. 

As indicated above, the nucleic acid sequences of the probe 
30 oligonucleotides comprising the high density arrays are known. The sequences of the 

probes showing the largest hybridization differences (and families of such differences) can 
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be used to identify the differentially expressed genes in the compared samples by any of a 
number of means. 

Thus, for example, sequences of the differentially hybridizing probes may 
be used to search a nucleic acid database {e.g., by a BLAST, or related search of the 
5 fragments against all known sequences). Alternatively, some sequence reconstruction 
using the families of probes that change by similar amounts can also be done. The 
database search for known genes that include sequences complementary (or nearly 
complementary ) to the probes that change the most is not difficult and because it is 
generally easier than sequence reconstruction is the preferred method for identifying the 

10 differentially expressed sequences. 

In another embodiment, the differential hybridization pattern indicates that 
there are significant differences in the overall expression profile(s) between the tested 
samples, and identifies probes that are specific for the differences. These probes can be 
used as specific affinity reagents to extract from the samples the parts that differ. This can 

15 be accomplished in several ways: 

In one approach, the material hybridized to the probes that show the greatest 
differences between samples can be micro-extracted from the high density array. For 
example, the hybridized nucleic acids can be removed using small capillaries. 
Alternatively probes that are anchored to the chip with a photolabile linker can be released 
20 by selective irradiation at the desired parts of the high-density array. 

In another approach, because the sequence pf all the probes on the high- 
density array is known, and the probes that hybridize differentially have been identified, 
the latter can be used as affinity reagents to extract the nucleic acids that differentially 
hybridize in the test samples. Once the differentially hybridizing probes are identified in 
the array, the probe (or probes) can be synthesized on beads (or other solid support) and 
hybridized to the samples (not necessarily fragmented for this step -full length clones may 
be desirable). The material that is extracted can be cloned and/or sequenced, according to 
standard methods known to those of skill in the art, to obtain the desired information about 
the differentially expressed species {e.g. clones can be screened with labeled 
oligonucleotides to determine ones with appropriate inserts, and/or randomly chosen and 
sequenced). 
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in still another approach, the sequence of fte hybridized probes of interest 
can be used to generate amplification primers (*.*. reverse transcription - to PC* 
pits, The ifferentiallvcpressedseouencecanthenbe amplified and used as a probe 
Zb a genomic or cDNA library using sequence sprecific primers determmed from me 

Z - described above (, g .. primerbased on poly A or added * sequence,. Examples of 
^ropriateclo.ngandsequencingtech.ques.andins— suffic.en, to .r.. perso. 
of skill tough many cloning exercises axe found in Berger and Krmmel, <ta* * 

ed , Vol 1-3; and Curren, * » F M ' M ° ' * 

Current Protocols, aioin, venture between Greene Publishing Associates, Inc. and John 
Wiley * Sons, Inc., (1994 Supplement) (Ausubel). Product informatton from 
lifers of biologica. reagents and experiment equipment also pro.de —on 

company (Sain, Louis, MO,, R&D systems (Minneapolis, MN,, Pharmacta LKB 
BiotLlU™^ 

Oenes Corp., Aldrich Chemical Company (Milwaukee, W.,, Glen Rese*rcK .n., ^GIBCO 
BRL Life Techno.ogies, Irrc. (Gaithersberg, MD,, Fluka Chemica-B.ochem.ka Ana.ytika 
(F ,uka Chemie AG, Buchs, Swiueriand), Invitrogen, San Diego, CA, and Apphed 
Biosystems (Foster CUy, CA), as we., as many other commercial sources known to one 

skill 

In short using the above-described method, differentially expressed genes 
can be identified without prior assumptions about which genes to monitor and wtthout 
priorknowledgeofseouence. Once identified (and sequenced if no, a P rev,ous,y 
sequenced gene), fte new sequences can be inc.uded in a high density array desrgned ,o 
dl, and quantify specific genes in fte same way as described in copendmg apphcations 
No 08/ 5 29,U5f,ledonSep tt mber.5, 1 99 5 andPCT/US96/14839. Thus,u,etwo 
approaches axe complement in ft* one can be used ,0 broadly search for exp«ss.on 
differences of perhaps unknown genes,, while the other is used ,0 more specfical.y 
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m „„ it or those genes that have been chosen as importan, or .hose genes ft. have been 
previously at least partially sequenced. 

XI. Kits for Expression Monitoring and Generic Difference Screening. 

In another embodiment, this invention provides kits for express.on 
monitoring and/or generic difference screening. The kits include, bu, are no, Hmited ,0a a 
container or containers containing one or more high density oligonucleoude arrays of thts 
invention. Preferred kits for generic difference screening include a. leas, two tgh denstty 
arrays The kits can a!so include a !abel or labels for labeling one or more nuc.e.c actd 
samples. In addition, the kits can indude one or more ligatable oligonucleotide. In 
certain embodiments, the ki, contains pools of differen, ligauble oligonucleotides, ^ 
preferably pools of every possible oligonucleotide of a particular lengft (e.g.. all possible 6 
mers, or sets of specific ligataWe o.igonudeotides. One of skill in the art will apprecate 
ftatfte kits may include any other of ft. various Hocking reagent labels, devices 
trays, microscope filters, syringes, e,c.) buffers, and the like useful for performing the 
hybridizations and ligation reactions described herein. In addition, the ki. may .nclude 
software provided on a storage medium (e.g.. optica, or magnetic disk) for fte selection of 
probes and/or fte analysis of hybridization data as described herein. In addition, fte k,K 
may contain insttuctional materials teaching fte use of fte ki. in fte various methods of 
ftis invention (,«., in practice of various expression monitoring methods or genenc 
difference screening methods described herein). 

XII. Computer-Implemented Expression Monitoring. 

The methods of monitoring gene expression of this invention may be 
performed utilizing a computer. The computer typical* runs a software program fta. 
includes com P u,er code incorporating fte invention for analyzing hybridization in.enst.tes 
measured from a subs<ra«e or chip and thus, monitoring fte expression of one or more 
genes or screening for differences in nucleic acid abundance, Alftough fte following w.11 
describe specific embodimen.s of fte invention, fte invention is no, limited ,0 any one 
embodiment so the following is for purposes of illustration and not limitauon. 
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of a comDUter system used to execute the 
Fig 6 illustrates an example ot a compute / 

software of an embodiment of the pre.cn, .nvention. As shown, shows a computer system 
,00 ine.udes a monitor ,02, screen 104, cabinet 106, keyboard 108, and mouse U0 
Mouse , ,0 may have one or more buttons such as mouse bunons . ,2. Cab.net 06 bouses 
a CD-ROM drive 114, a system memory and a hard drive (bo* shown m F«. 7) whtch 
n^y be utilized to store and retrieve software programs incorporating computer code *a, 
imp ,emen,s the invention, data for use with the invention, and the like. A.thoughaCD- 
ROM llSisshownasanexempiarycomputerreadab.es.oragemedium.outercomputer 

readme storage media inducing floppy disks, tape, flash memory, system memory, and 
hard drives may be utitad. Cabinet 106 a!so houses familiar computer components (no. 
shown) such as a central processor, system memory, hard disk, and the hke. 

Fig 7 shows a system block diagram of computer system 100 used to 
execute the software of an embodiment of the present invention. As in Fig. 6, computer 
system 100 includes monitor 102 and keyboard 108. Computer system 100 former 
includes subsystems such as a centra! processor 1 20, system memory ,22 I/O conn, ler 
,24 display adapter 126, removable disk 128 (,g„ CD-ROM drive), ftxed d,sk .30 (e.g., 
nard drive), network unerfac* 132, and speaker ,34. Outer computer systems suitable or 
use with ttte present invention may include additional or fewer subsystems. For example, 
another computer system could include more matt one processor 120 (..*., a multi- 
20 processor system) or a cache memory. 

Arrows such as 136 represent the system bus architecture of computer 
system 100. However, these arrows are illustrative of any interconnection scheme serving 
to link the subsystems. For example, a local bus could be utilized to connect the central 
processor the systemmemory and display adapter. Computer system 100 shown mFtg. 
25 7 is but an example of a computer system suitable for use with the present invention. 
Other configurations of subsystems suitable for use with the present invention wnl be 
readily apparent to one of ordinary skill in the art. 

Fig. 8 shows a flowchart of a process of monitoring the expression of a 
gene The process compares hybridization intensities of pairs of perfect match and 
30 mismatch probes that are preferably covalently attached to the surface of a substrate or 
chip Most preferably, the nucleic acid probes have a density greater than about 60 
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differen, nucleic acid probe, per . cm= of .he substi^e. Although the flowcharts show a 
sequence of steps for clarity. *is is no, an indication that the steps must be performed ,„ 
mis specific order. One of ordinary skill in the art would readily recognize that many of 
.he steps may be reordered, combined, and de,e,ed withou. departing from the invention. 

Initially, nucleic acid probes are selected mat are complementary to the 
urge, sequence (or gene). These probes are the perfect match probes. Anothersetof 
probes is specified ma. are intended to be not perfectly complementary to *e urge, 
Sequence. These probes are the mismatch probes and each mismatch probe mdudes a, 
,el one nucleotide mismateh from a perfec, ma.ch probe. Accordingly, a nusma.cn probe 
and the perfect match probe from which it was derived make up a pair of probes. As 
mentioned earner, the nucleotide mismatch is preferably near me center of me nusmatch 

PK>be ' The probe lengths of me perfec, match probes are typically chosen to 

exhibit high hybridization affinity with the urge, sequence. For example, the nucleic acid 
probes may be all 20-mers. However, probes of varying lengms may also be synthesrzed 
on the substrate for any number of reasons including resolving amb.gurt.es. 

The urge, sequence is typically fragmented, labeled and exposed to a 
substiate including the nucleic acid probes as described earlier. Tne hybridization 
intensities of the nucleic acid probes is men measured and input into a computer system. 
The computer system may be the same system ma, directs the substrate hybridization or „ 
may be a different system altogether. Of course, any computer system for use with the 
invention should have available other details of the experiment including possibly the gene 
name, gene sequence, probe sequences, probe locations on the substrate, and the hke. 

Referring to Fig. 8, after hybridization, the computer system receives mput 
of hybridization intensities of the multiple pairs of perfec match and mismatch probes a, 
step 202 The hybridization intensities indicate hybridization affinity between the nucletc 
acid probes and the Urge, nucleic acid (which corresponds ,o a gene). Each pair induces a 
perfec, match probe mat is perfectiy complementary to a portion of the Urge, nucleic ac,d 
■d a mismatch probe ma, differs from me perfec, mateh probe by a, leas, one nucleotide^ 
At step 204, the computer system compares me hybridization intensmes of 
me perfect match and mismatch probes of each pair. If me gene is expressed, me 
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hybridization intensity (or affinity) of a perfect match prone of a pair shou d be 
^ognizabiy higher man u,e corresponding m. S match probe. Ocneraiiy .f*e 

hybridizations intensities of a pair of probes are substantial the same, ., may .nd cate the 
g »e is no, expressed. However, the determination is no, based on a single patr of probes. 
I determination of whether a gene is expressed is based on an analysis of many pa,rs of 
probes. An exemplary process of comparing the hybridization intensities of the patrs of 
probes will be described in more detail in reference to Fig. 9. 

After the system compares the hybridization intensity of ,he perfect match 
and mismatch probes, ft. system indicates expression of the gene a, step 206. As an 
example, the system may indicate to a user that the gene is either present (expressed), 

marginal or absent (unexpressed). 

Fig 9 shows a flowchart of a process of detemuning if a gene ,s expressed 
utilizing a decision matrix. At step 252, the computer system receives raw scan da* of N 
pairs of perfect match and mismatch probes. In a preferred embodiment, the hybruhzation 
intensities are photon counts from a fluorescein labeled targe, tha, has hybridized ,o u,e 
probes on the substiate. For simplicity, the hybridization intensity of a perfect match probe 
wil, be designed V ' - - "« ti °" " *' *** ^ 

Hybridization intensities for a pair of probes is retrieved at step 254. The 
background signal intensity is subtracted from each of the hybridization intensities of the 
pair a, step 256. Background subtraction may also be performed on all the raw scan data a, 

the same time. 

At step 258, the hybridization intensities of the pair of probes are compared 
to a difference threshold (D) and a ratio threshold (R). It is determined if the difference 
between the hybridization intensities of the pair (I^ - I. J is greater than or equal to the 
difference threshold AND the quotient of the hybridization intensities of the pair (!,„, / 1_> 
is greater than or equal to the ratio threshold. The difference thresholds are typically user 
defined values that have been determined to produce accurate expression morutonng of a 
gene or gene, In one embodiment, the difference threshold is 20 and the ratio threshold is 
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If I -I >= D and I ptn /I mm >=R, the value NPOS is incremented at step 
260 In general, NPOS is a value that indicates the number of pairs of probes which have 
hybridization intensities indicating that the gene is likely expressed. NPOS is utilized in a 
determination of the expression of the gene. 

At step 262, it is determined if U - 1^ >= D and I mm / I pm >= R- If this 
expression is true, the value NNEG is incremented at step 264. In general, NNEG is a 
value that indicates the number of pairs of probes which have hybridization intensities 
indicating that the gene is likely not expressed. NNEG, like NPOS, is utilized in a 
determination of the expression of the gene. 

For each pair that exhibits hybridization intensities either indicating the 
gene is expressed or not expressed, a log ratio value (LR) and intensity difference value 
(IDIF) are calculated at step 266. LR is calculated by the log of the quotient of the 
hybridization intensities of the pair (1^ / I m J- The IDIF is calculated by the difference 
between the hybridization intensities of the pair (1^ - I mm ). If there is a next pair of 
hybridization intensities at step 268, they are retrieved at step 254. 

At step 272, a decision matrix is utilized to indicate if the gene is expressed. 
The decision matrix utilizes the values N, NPOS, NNEG, and LR (multiple LRs). The 
following four assignments are performed: 
PI = NPOS /NNEG 
P2 = NPOS/N 

P3 = (10 * SUM(LR)) / (NPOS + NNEG) 
These P values are then utilized to determine if the gene is expressed. 

For purposes of illustration, the P values are broken down into ranges. If PI 
is greater than or equal to 2.1, then A is true. If PI is less than 2.1 and greater than or 
equal to 1 .8, then B is true. Otherwise, C is true. Thus, PI is broken down into three 
ranges A, B and C. This is done to aid the readers understanding of the invention. 

Thus, all of the P values are broken down into ranges according to the 



following: 



A = (P1 >=2.1) 

B = (2.1>P1>=1.8) 

C = (P1< 18) 
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X = (P2 >= 0.35) 

Y = (0.35 > P2 >= 0.20) 

Z = (P2 < 0.20) 

Q = (P3 >= 1.5) 

R = (1.5>P3>= 1.1) 

S = (P3 < 1.1) 

Once the P values are broken down into ranges according to the above boolean values, the 

gene expression is determined. 

The gene expression is indicated as present (expressed), marginal or absent 
(not expressed). The gene is indicated as expressed if the following expression is true: A 
and (X or Y) and (Q or R). In other words, the gene is indicated as expressed if PI >- 2.1, 
P2 >= 0.20 and P3 >= 1.1. Additionally, the gene is indicated as expressed if the following 

expression is true: B and X and Q. 

With the forgoing explanation, the following is a summary of the gene 

expression indications: 

Present A and (X or Y) and (Q or R) 

B and X and I 

Marginal . A and X and S 

B and X and R 
B and Y and (Q or R) 

Absent All others cases (e.g., any C combination) 

In the output to the user, present may be indicated as "P," marginal as "M" and absent as 
"A" at step 274. 

Once all the pairs of probes have been processed and the expression of the 
gene indicated, an average often times the LRs is computed at step 275. Additionally, an 
average of the IDIF values for the probes that incremented NPOS and NNEG is calculated. 
These values may be utilized for quantitative comparisons of this experiments with other 

experiments. 
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Quantitative measurements may be performed at step 276. For example, the 
current experiment may be compared to a previous experiment {e.g., utilizing values 
calculated at step 270). Additionally, the experiment may be compared to hybridization 
intensities of RNA (such as from bacteria) present in the biological sample in a known 
quantity. In this manner, one may verify the correctness of the gene expression indication 
or call, modify threshold values, or perform any number of modifications of the preceding. 

For simplicity, Fig. 9 was described in reference to a single gene. However, 
the process may be utilized on multiple genes in a biological sample. Therefore, any 
discussion of the analysis of a single gene is not an indication that the process may not be 
extended to processing multiple genes. 

Figs. 10A and 10B show the flow of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data. For 
example, the baseline scan data may be from a biological sample where it is known the 
gene is expressed. Thus, this scan data may be compared to a different biological sample 
to determine if the gene is expressed. Additionally, it may be determined how the 
expression of a gene or genes changes over time in a biological organism. 

At step 302, the computer system receives raw scan data of N pairs of 
perfect match and mismatch probes from the baseline. The hybridization intensity of a 
perfect match probe from the baseline will be designed "l^" ™ d me hybridization intensity 
of a mismatch probe from the baseline will be designed "I mm ." The background signal 
intensity is subtracted from each of the hybridization intensities of the pairs of baseline 

scan data at step 304. 

At step 306, the computer system receives raw scan data of N pairs of 
perfect match and mismatch probes from the experimental biological sample. The 
hybridization intensity of a perfect match probes from the experiment will be designed 
"J^" and the hybridization intensity of a mismatch probe from the experiment will be 
designed "J mm ." The background signal intensity is subtracted from each of the 
hybridization intensities of the pairs of experimental scan data at step 308. 

The hybridization intensities of an I and J pair may be normalized at step 
310. For example, the hybridization intensities of the I and J pairs may be divided by the 
hybridization intensity of control probes as discussed above in Section IV(A). 
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A. step 312. the hybridization intensities of the I and J pair of probes are 
compared .0 a difference threshold (DDIF) and a ra,io thresho.d (RDIF). 1. is determined 
if the difference between the hybridization intensities of the one pair (J„ - J_> and the 

of the hybridization intensities of one pair - U *>° «• outer patr ( U - U « 
greater Ln or eoua. to the ratio threshold. The difference thresholds are typ.ca.ly user 
defined vaiues that have been defined to produce accurate expression monttonng of 
geneorgenes ^ _ ^ / ^ _ ^ >= ^ ^ 

vahre NINC is incremented at step 314. .n genera., NINC is a value tha, mdtcates the 
experiment pair of probes indicates that the gene expression is likely greater (or 
Jreased) than me baseline sample. NINC is utilized in a determmatton of wheuter the 
expression of the gene is greater (or increased,, less (or decreased, or did not change tn the 
experimental sample compared to the baseline sample. 

, ^;*vt T WI -I )>= DDIF and (Jj^ - 

At step 316 it is determined if (J^-Jmm; Upm W 

j , , „ / , J >- RDIF. If mis expression is true, NDEC is incremented. In general, 
^EC IT a vine mat indices me experimental pair of probes indicates tha, the gene 
expression is likely less (or decreased) man the baseline sample. NDEC is — tn a 
determination of whemer the expression of the gene is greaKr (or increased), .ess (or 
decreased) or did no, change in the experimental sample compared to the basehne sample. 

For each of the pairs mat exhibits hybridization intensities either mdtcaUng 
me gene is expressed more or less in the experimental sample, the values NPOS, NNEG 
andLRarecalcuUtedforeachpairofprobes. These values are calculated as dtscussed 
above in reference ,0 Fig. 9. A suffix of either "B» or »E" has been added to each value m 
order ,0 indicate if the value denotes the baseline sample or the experimental sample, 
respectively. If there are next pairs of hybridization intensities a, sup 322, they are 
processed in a similar manner as shown. 

Referring now to Fig. 10B, an absolute decision computation is performed 
for bom the baseline and experiment samples a, step 324. The absolute decision 
compuution is an indication of whether the gene is expressed, margin* or absent m each 
of the baseline and experiment samples. Accordingly, in a preferred embodiment, tins 
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step entails performing steps 272 and 274 from Fig. 9 for each of the samples. This being 

done teb**^***'*^^^^^^ . 

At step 326, a decision matrix is utilized to determine the difference m gene 

expression between the two samples. This decision matrix utilizes the values, N, NPOSB, 
NPOSE NNEGB, NNEGE, NINC, NDEC, LRB, and LRE as they were calculated above. 
Tfce decision matrix performs different calculations depending on whether NINC is greater 
than or equal to NDEC. The calculations are as follows. 

If NINC >= NDEC, the following four P values are determined: 

PI = NINC /NDEC 
P2 = NINC/N 

p3 , ((N poSE - NPOSB) - (NNEGE - NNEGB)) / N 
P4 = 10 * SUM(LRE - LRB) / N 
These P values are then utilized ,o determine the difference in gene expression between the 

two samples. 

For purposes of illustration, the P values are broken down mto ranges as 
was done previously. Thus, all of the P values are broken down into ranges according to 
the following: 

A = (P1>=2.7) 

B = (2.7>P1>=18) 

C = (P1 <1.8) 

X = (P2>= 0.24) 

Y = (0.24 >P2>= 0.16) 

Z = (P2< 0.160) 



M = (P3 >=0.17) 

N = (0.17 >P3>= 0.10) 

O = (P3<0.10) 



q = (P4>=1.3) 
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R = (1.3>P4>=0.9) 
s = (P4 < 0.9) 

Once ft. P vaiues are broken down into ranges according .o ft. above boolean values, the 
difference in gene expression between the two samples is derermmed. 

,» this case where NINC >- NDEC, the gene express™ change .s mdtcated 
as increased, margina. increase or no change. The fo.lowing is a summary of ft. gene 



expression indications: 
Increased 



A and (X or Y) and (Q or R) and (M or N or O) 
A and (X or Y) and (Q or R or S) and (M or N) 

B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 



v j Marginal AorYorSorO 

[| 5 increase B and (X or Y) and (Q or R) and O 



No Change 



B and (X or Y) and S and (M orN) 
C and (X or Y) and (Q or R) and (M or N) 

All others cases {e.g., any Z combination) 



in the output to the user, increased ma, be indicated as V marginal increase as Iff and 

no change as "NC." 

If NINC < NDEC, the following four P values are determined: 



25 PI = NDEC /NINC 

P2 = NDEC / N 

P3 = ((NNEGE - NNEGB) - (NPOSE - NPOSB)) / N 
P4 = 10 * SUM(LRE - LRB) /N 

30 These P values are then utilized to determine the difference in gene egression between the 

two samples. 
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The P values are broken down into the same ranges as for the other case 
where NINC >= NDEC. Thus, P values in this case indicate the same ranges and will not 
be repeated for the sake of brevity. However, the ranges generally indicate different 
changes in the gene expression between the two samples as shown below. 

In this case where NINC < NDEC, the gene expression change is indicated 
as decreased, marginal decrease or no change. The following is a summary of the gene 
expression indications: 

Decreased A and (X or Y) and (Q or R) and (M or N or O) 

A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 



Marginal 
Decrease 



A or Y or S or O 

B and (X or Y) and (Q or R) and O 
B and (X or Y) and S and (M or N) 
C and (X or Y) and (Q or R) and (M or N) 



No Change All others cases (e.g. , any Z combination) 



In the output to the user, decreased may be indicated as »D,» marginal decrease as "MD» 

and no change as "NC." 

The above has shown that the relative difference between the gene 
expression between a baseline sample and an experimental sample may be determined. An 
25 additional test may be performed that would change an I, MI, D, or MD (i.e., not NC) call 
to NC if the gene is indicated as expressed in both samples {e.g., from step 324) and the 
following expressions are all true: 
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AverageCIDIFB) >= 200 
Average(IDIFE) >= 200 

1.4 >= Average(IDIFE) / Average(IDIFB) >= 0.7 
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THUS, when a gene is expressed i„ bo* samples, a caU of increased or decked (whe* 
m ar g ina, or no,, - be changed .0 ano change ca>, if ti,e average huensny Affere ce * 
Jsample is relatively large or sub— y the same for bo* samples. The .D.FB and 
IDIFE are calculated as the sum of did. ID,Fs for each sample div.ded by N 

At step 328, vah.es for quantitative difference evaluation are calculated. An 
t,,l j \.n -I )) for each of the pairs is calculated. Additionally, a 
average of ((V-U,) (U is calculated. These values 

quotient of the average of ^-J„ and the average of !„, U>scalc 
may be utilized to compare the results with other experiments in step 330. 

EXAMPLES 

The following examples are offered to illustiafc, but no, to limit me presen, 

invention. 

Example 1 

First Generation Oligonucleotide Arrays Designed to Measure mRNA 
Levels far a Small Number of Murine Cytokines. 

A) Preparation of Labeled RNA. 

I) From Each of the Preselected Genes. 

Fourteen genes (IL-2, IL-3, 11-4, IL-6. 11-10, IL-I2p40, GM-CSF, IFN-y, 
TNF-a CTLA8, B-actin, OAPDH, IL-l 1 recep,or, and Bio B) were each cloned into the p 
Bluescnp, I. KS <♦> phagemid (Straagene, U Mia, California, USA). The orientation of 
me insert was such ma, T3 RNA polymerase gave sense transcripts and T7 polymerase 

save antisense RNA. 

Labeled ribonucleotides in an in vitro transcription (1VT) reacon. Either 

biotin- or fluorescein-labeled UTP and CTP (1 :3 labeled to unlabeled) plus unlabeled ATP 

and OTP were used for the reaction with 2500 unHs of T7 RNA polymerase (Ep.cen.re 

Technologies, Madison, Wisconsin, USA). In virro transcription was done wtm cu, 

,empla,es in a manner like ma, described by Me.ion e, a,.. Nucleic Adds Research. 12: 

7035-7056 (1984). A typical in v/n-o inscription reaction used 5 ug'DNA .emplate, a 

buffer such as ma, included in Amnion's Maxiscrip, in vitro Transcription Ki, (Amb.on 

to, Husmn, Texas, USA) and OTP (3 mM), ATP (. .5 mM), and CTP and fluorescein^ 
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UTP (3 mM «o«al, UTP: Fl-UTP 3:1) or UTP and fluorescemated CTP (2 mM .0*1, CTP: 

Fl-CTP, 3:1 )■ R*ac<i°- *>" ! " «« *»»*" ""** ^ 2 ° ^ 
The reaction was run from 1 .5 to about 8 hours. 

Following the reaction, unincorporated nucleotide triphosphates were 
5 removed using a size-selective membrane (microcon-100) or Pharmacia microspin S-200 
column The total molar concentration of RNA was based on a measurement of the 
absorbance a, 260 nm. Following quantitation of RNA amounts, RNA was fragmented 
randomly to an average lengm of approximate* 50- 100 bases by heating a, 94°C „ ,40 
mM Tris-acetate P H 8.1, 100 mM potassium acetate, 30 mM magnesium aceuue for 30 - 
10 40 minutes. Fragmentation reduces possible interference from RNA secondary structure, 
and minimizes the effects of multiple interactions with closely spaced probe molecules. 

2) From cDNA libraries. 

Labeled RNA was produced from one of two murine cell lines; T10, a B 
cell plasmacytoma which was known no. to express the genes (except IL-10, actin and 
GAPDH) used as target genes in tins study, and 2D6, an IL-12 growth dependent T cell 
line (Th, subtype) that is known to express most of the genes used as Urge, genes ,n tins 
study Thus, RNA derived from the T10 cell line provided a good total RNA baselme 
mixture suitable for spiking with known quantities of RNA from the particutar Urge, 
20 genes In contrast, mRNA derived from the 2D6 cell line provided a good positive control 
" providing typical endogenous!, transcribed amounts of the RNA from me Urge, genes. 

i) The T10 murine B cell line. 
The T10 cell line (B cells) was derived from the IL-6 dependent murine 
25 plasmacytoma line Tl 165 (Nordan et al. (1986) Science 233: 566-569) by selection in the 
presence of IL-1 1 . To prepare the directional cDNA library, total cellular RNA was 
isolated from T10 cells using RNAStat60 (Tel-Test B), and poly (AY ™A was selected 
using the PolyAtract kit (Promega, Madison, Wisconsin, USA). First and second strand 
cDNA was synthesized according to Toole et al, (1984) Nature, 3 12:^342-347, except that 
30 5-methyldeoxycytidine 5'triphosphate (Pharmacia LKB, Piscataway, New Jersey, USA) 
was substituted for DCTP in both reactions. 
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To determine cDNA frequencies T10 libraries were plated, and DNA was 
^ered to nitroceUulose filters and probed with labeled P -ac,in, GAPDH and IL-10 
ZL ACin was represent a, a frequency of .:3000, GAPDH a, U.OOO. and .L-,0 a, 
"^Labeled sense and antisense T10 RNA samples were synthesized from Not. and 
Sfil cu, CDNA libraries in in v,,ro transcription reactions as described above. 

u) The 2D6 murine helper T cells line. 
The 2D6 cell line is a murine IL-12 dependent T cell line developed by 
Fui iwara„*/. Cells were cultured in RPMI 1640 medium with .0% heat inactivated fetal 
calf serum (JRH Biosciences,, 0.05 mM P-mercaptoethano, and recombinant munne IL-12 
(,00 units/mL, Genetics Institute, Cambridge, Massachusetts, USA). For cyttlane 
nduction, cells were preincubate* overnight in IL-12 free medium and men resumed 
00' cell,m„. After incubation for 0, 2, 6 and 24 hours in media coning 5 nM calctum 
onophore A23!87 (Sigma Chemical Co., St. Louis Missouri, USA) and ,00 nM 
^phorbol-U-myrisfcte 13-acetate (Sigma), cells were collected by centnfugation and 
washedoncewi*phosphatebufferedsalineprior«.isolationofRNA. 

Labeled 2D6 mRNA was produced by directionaUy cloning the 2D6 cDNA 
with oZipLox, Notl-Sa.1 arms available from GibcoBRL in a manner similar to T10. The 
.htearized pZll library was inscribed with T7 ,0 generate sense RNA as descnbed above. 

Hi) RNA preparation. 
For material made directly from cellular RNA, cytoplasmic RNA was 
extracted from cells by the method of Favaloro e, al., (1980) Me,H. 65: 718-749 

and poly (A)' RNA was isolated with an oligo dT selection step (PolyAtract, Promega, ). 
RNA was amplified using a modification of the procedure described by Eberwine «. 
(1992) Proc Nail. Acad. Sci. USA, 89: 3010-3014 (see also Van Gelder e, al. (1990) 
Science 87: 1663-1667). One microgram ofpoly(A)+ RNA was converted into 
oouble-stianded cDNA using a cDNA synthesis kit (Life Technologies) with an ohgo dT 
prime incorporating aT7 RNA polymerase promoter site. After second stiand synti.es. 
the reaction mixture was extract with phenol/chloroform and the dduble-stranded DNA 
isolated using a membrane filtration sup (Mircocon-100, Amicon, Inc. Beverly, 
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Massachusetts, USA,. Labeled cRN A was made directly *» cDNA poo, with an .VT 

step as uc PTsIA size of 1 000 ribonucleotides, 

tan the absorbance at 260 and assuming an average RNA s.ze 

RNA concentration was calcuiated using the convention, —on that . OD „ 
^-OttgofRKA.and^ti.gofce^n^Aconsis.onp.oiesofHKA 

m0 ' eCUleS ' Cellular mRNA was also labeled directly without any intermediate cDNA 
or RNA syndesis steps. Poly (A)" RNA was fragmented as described above, and the , 
ends of the fragments were ictaased and men incubated ovenigh, wtth a btottnylated 

Technologies). Alternatively, mRUA was labeled directly by UV-induced crossing to a 
psoralen derivative linked to biotin (Schleicher & Schuell). 

B) High Density Array Preparation 

A high density array of 20 mer oligonucleotide probes was produced usmg 
VLSIPS technology. The high density array included the oligonucleotide probes as listed 
m Table 2. A central mismatch control probe was provided for each gene-specrfic prc.be 
resulting in a high density a^y containing over .6,000 different oUgonuOeotide probes. 
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House Keeping Genes: 

Bacterial gene (sample 
preparation/amplification 

control) 



Target Nucleic Acid 


Number of Probes 


TT O 


691 




751 


TT -4 


361 


IL-6 


691 


TT -10 


.481 




911 




661 


IFN-Y 


991 


TNF-a 


641 


mCTLA8 


391 


IL-11 receptor 


158 


GAPDH 


388 


B-actin 


669 


Bio B 


286 



The high density array was synthesized on a planar glass slide. 
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C) Array Hybridization and Scanning. 

The RNA transcribed from cDN A was hybridized to the high density 
oligonucleotide probe array(s) at low stringency and then washed under more stringent 
condition, The hybridization solutions contained 0.9 M NaCl, 60 mM NaH 2 P0 4 , 6 mM 
EDTA and 0.005 % Triton X-100 , adjusted to P H 7.6 (referred to as 6x SSPE-T). In 
addition, the solutions contained 0.5 mg/ml unlabeled, degraded herring sperm DNA 
(Sigma Chemical Co., St. Louis, Missouri, USA). Prior to hybridization, RNA samples 
were heated in the hybridization solution to 9 «C for 10 minutes, placed on xce for 5 
minutes, and allowed to equilibrate at room temperature before beinrplaced m the 
hybridization flow cell, Following hybridization, the solution was removed, the arrays were 
washed with 6XSSPE-T at 22°C for 7 minutes, and then washed with 0.5x SSPE-T at 
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4 „. C for „ minutes. When biotin-.abe.ed RNA was used, the hybridized RNA was 
stained with a strep«avidi„-phycoery*rin conjugate (Morula, Probes, Inc.. Eugene, 
stame ui u^MWtd arrays were stained with 2 ug/ml streptav.d.n- 

Oregon, USA) prior to reading. Hybridized arrays wer 

phyco=rythrinin6xSSPE-Tat40°Cfor5minute S . 

The arrays were refusing scanning confocal microscope (Molecu.ar 
Dynamics, Sunnyvale, California, USA) modified for the purpose. The scanner uses an 
argon ion iaser as the excitation source, and the emission was detected w, h a 
photomuiupiier tube through either a 530 „ bandpass finer (fluorescein, or a 560 nm 

longpass filter (phycoerythrin). 

Nucleic acids of either sense or antisense orientations were used » 
hybridization experiment, Arrays with for either orientation (reverse complements* 
eLh other) were made using the same se, of photolithographic masks by reversing the 
order of the photochemical steps and incorporating the complementary nucleotide. 

m Qu.mimiveAna^ofHybrUbatlonPat.e^sanilntcmU^ 

Tte quantitative analysis of the hybridization results involved counting the 
„ in which the perfect match probe (PM) was brighter than the 
mismatch probe (MM), averaging the differences (PM minus MM) for each pro* fa^ly 
(i . probe collection for each gene), and comparing the va.ues to those obtain*, in a 
sid e'.by-side experiment on an identically synthesized array with an unspiked sample (,f 
applicable). The advantage of the difference method is tha, signals from random cross 
hybridization contribute equally, on average, to the PM and MM probes while specfic 
hybridization contributes more to the PM probe, By averaging the pairwise difference, 
I real signals add constructively while the contributions from cross hybridization tend to 

amCe1 ' The magnitude of the changes in the average of the difference (PM-MM) 

values was interpreted by comparison with the results of spiking experiments as well as the 
signal observed for the interna, standard bacteria. RK A spiked into each sample a, a known 
Analysis was performed using algorithms and software described herein. 
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E) Optimization of Probe Selection 
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In order to optimize probe selection for each of the target genes, the high 
density array of oligonucleotide probes was hybridized with the mixture of labeled RNAs 

u ftfce target eenes Fluorescence intensity at each location on the 
transcribed from each of the target genes. 

- t . , hv tannine the high density array with a laser 
hieh density array was determined by scanning u ° 

ZL J— —< — - a - a « ,uis,no " 

SyS ' em ' Probes were .hen selected for further dau, analysis in a mo-step procedure. 

Firs, inorder.obecoun.ed.medifferenceininte^betweenaprobeand,^ 
Ending — probe had ,o exceed a thresho.d Hmit (50 counts, or abou « 
TZunJutniscase)^ — ^ 

well and probes for which the mismatch contro, hybridi.es at an rntenstty comparable to 
the perfect match.^ ^ ^ ^ ^ ^ ^ ^ ^ a ^ ^ ta 

principle conuinsnoneoftesequencesonmehighdensiVarray.Induscase.Ute 

sel RNA population should have been incapable of hybridizing ,0 any of the probe on 
r^leitheraprobe or iB mismatch showed a signal above a ftresho.d value 
(,00 counu above background) i. was no, included in subsequent a^lysts. 

Then, the signal for a particular gene was counted as the average dtfferenc* 
(perfect match - mismatch control) for the selected probes for each gene. 

E)ResuUs: The High Density Arrays Provide Specific and Sensitive Detection of 

Target Nucleic Acids. 

As explained above, the initial arrays contained more than 16,000 probes 
to ,wereco m plemenUuy.ol2murinenu W As-9cytokines,lcv W k m erecep to r,2 

constitutive* expressed genes (5-a«in and glyceraldehyde 3-phospha.e dehydrogenase) - 
ra, cytokine and 1 bacteria, gene (E. col, biotin synthetase, WoS) which serves as a 
quantitation reference. The initial experiments with these relative., simple anays were 
designed to determine whether short in sUu synthesized oligonucleotides can be made to 
hybridize with sufficient sensitivity arrf specificity to quantitatively detect RNAs tn a 
complex ceUular RNA population. These arrays were intentionauy highly redundant. 
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containing hundreds of oligonucleotide probes per RNA. many more than necessary for the 
determination of expression levels. This was done to investigate the hybridization 
behavior of a large number of probes and develop general sequence rules for a prion 
selection of minimal probe sets for arrays covering substantially larger numbers of genes. 

The oligonucleotide arrays contained collections of pairs of probes for each 
of the RNAs being monitored. Each probe pair consisted of a 20-mer that was perfectly 
complementary (referred to as a perfect match, or PM probe) to a subsequence of a 
particular message, and a companion that was identical except for a single base difference 
in a central position. The mismatch (MM) probe of each pair served as an internal control 
for hybridization specificity. The analysis of PM/MM pairs allowed low intensity 
hybridization patterns from rare RNAs to be sensitively and accurately recognized m the 
presence of crosshybridization signals. 

For array hybridization experiments, labeled RNA target samples were 
prepared from individual clones, cloned CDNA libraries, or directly from cellular mRNA 
as described above. Target RNA for array hybridization was prepared by incorporating 
fluorescently labeled ribonucleotides in an in vitro transcription (IVT) reaction and then 
randomly fragmenting the RNA to an average size of 30 - 100 bases. Samples were 
hybridized to arrays in a self-contained flow cell (volume -200 uL) for times ranging from 
30 minutes to 22 hours. Fluorescence imaging of the arrays was accomplished with a 
scanning confocal microscope (Molecular Dynamics). The entire array was read at a 
resolution of 1 1 .25 urn (~ 80-fold oversampling in each of the 100 x 100 urn synthesis 
regions) in less than 15 minutes, yielding a rapid and quantitative measure of each of the 
individual hybridization reactions. 

1) Specificity of Hybridization 

In order to evaluate the specificity of hybridization, the high density array 
described above was hybridized with 50 P M of the RNA sense strand of IL-2, IL-3, IL-4, 
IL-6, Actin, GAPDH and Bio B or IL-10, IL-12p40, GM-CSF, IFN- Y , TNF-a, mCTLAS 
and Bio B. The hybridized array showed strong specific signals for each of the test target 
nucleic acids with minimal cross hybridization. 
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2) Detection of Gene Expression levels in a complex target sample. 

To determine how well individual RN A targets could be detected » the 
presence of total mammalian cell message populations, spiking experiments were earned 
Z Known amounts of individual RNA targets were spiked into labeled RNA denved 
^rep— ^ 

line was chosen because of the cytokines being monitored, only IL-10 is expressed at a 

detectable level. , 
Because simply spiking *e RNA mixture wi,h me se.ec.ed Urge, genes and 

men immediately hybridizing might provide an artificially elevated reading relative to me 
r es, of the mixrore, the spiked sample was treated to a series of procedures to m»ga e 
difrerencesbetweentheiibraryRKAanduteaddedBKA. Thus the ■sp.ke wasaddedto 
m e sampie which was then heated to 37°C and annealed. The sample was men frozen, 
.hawed, boiled for 5 minutes, cooled on ice and allowed to rerun, to room temperature 

before performing the hybridization. ,.,.„. , IU1S 

Figure 2A shows the results of an experiment in wiuch 13 target RNAS 
were spiked into me total RNA poo. at a .eve, of . :3000 (eo.uiva.ent to a few hundred 
copies per ce..). RNAfreo.uenciesaregivenasmemoh.amoun.ofanmd.v.du^.RNA 
per mole of total RNA. Figure 2B shows a smal. portion of me array (me boxed reg.or .of 
2A> confining probes specific for in.er.euxin-2 and in.er.euxin-3 (.L-2 and ,L-3,) RNA, 
andFigure2Cshowsmesameregioninmeabsenceofmespiked.argeU.The 

hybridizaiion signa.s are specific as indicated by the comparison between the spiked and 
unspiked images, ar.d perfect match (PM) hybridizations are we., discrimina^d from 
nussma.cl.es (MM) as show, by me pa«em of al.ema.ing bright rows (correspond.^ .o 
PM probes) and darker rows (corresponding .o MM probes). The observed vanatton 
among ft. different perfect match hybridization signals was highly reproducible and 
refiects me sequence dependence of me hybridization, In a few instances, me perfee. 
n^ch (PM) probe was no. significant* bright man i«s mismatch (MM) partner because 
„f cross-hybridization wim omer members of me comp.ex RNA popuhmon. Because me 
panems are highly reproducible and because deletion does no, depend on only a smgle 
probe per RNA, infrequent cross hybridization of mis type did no, preclude se„s,,.ve and 
accurate detection of even low level RNAS. 
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Similarly, infrequent poor hybridization due to, for example, RNA or probe 
secondary structure, the presence of polymorphism or database sequence errors does not 
preclude detection. An analysis of the observed patterns of hybridization and cross 
hybridization led to the formulation of general rules for the selection of oligonucleotide 
probes with the best sensitivity and specificity described herein. 

3) Relationship between Target Concentration and Hybridization Signal 

A second set of spiking experiments was carried out to determine the range 
of concentrations over which hybridization signals could be used for direct quantitation of 
RNA levels Figure 3 shows the results of experiments in which the ten cytokine RNAs 
were spiked together into 0.05 mg/ml of labeled RNA from the B cel. (T10) cDNA library 
at levels ranging from 1:300 to 1:300,000. A frequency of 1:300,000 is that of an mRNA 
present at less than a few copies per cell. In 10 u g of total RNA and a volume of 200 ul, a 
frequency of 1 :300,000 corresponds to a concentration of approximately 0.5 picomolar and 
0 1 femptomole(~6x 10 7 molecules or about 30 picograms)of specific RNA. 

Hybridizations were carried out in parallel at 40°C for 15 to 16 hours. The 
presence of each of the 10 cytokine RNAs was reproducibly detected above the 
background even at the lowest frequencies. Furthermore, the hybridization intensity was 
linearly related to RNA target concentration between 1 :300,000 and 1 :3000 (Figure 3). 
Between 1 :3000 and 1 :300, the signals increased by a factor of 4 - 5 rather than 1 0 because 
the probe sites were beginning to saturate at the higher concentrations in the course of a 15 
hour hybridization. The linear response range can be extended to higher concentrations by 
reducing the hybridization time. Short and long hybridizations can be combined to 
quantitatively cover more than a lOMbld range in RNA concentration. 

Blind spiking experiments were performed to test the ability to 
simultaneously detect and quantitate multiple related RNAs present at a wide range of 
concentrations in a complex RNA population. A set of four samples was prepared that 
contained 0.05 mg/ml of sense RNA transcribed from the murine B cell CDNA library, 
plus combinations of the 10 cytokine RNAs each at a different concentration. Individual 
cytokine RNAs were spiked at one of the following levels: 0, 1:300,000, 1:30,000, 1:3000, 
or 1 :300. The four samples plus an unspiked reference were hybridized to separate arrays 
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for 15 hours a. 40°C. The presence or absence of an RNA target was determined by ,he 
pattern of hybridization and how it differed from that of the unspiked reference, and the 
concentrations were detected by the intensities. The concentrations of each of the ten 
cytokines in the four blind samples were correctly determined, with no false posittves or 
false negatives. 

One case is especially noteworthy: IL-10 is expressed in the mouse B cells 
used to make the CDN A library, and was known to be present in the library at a frequency 
of 1-60 000 to 1:30,000. In one of the unknowns, an additional amount of IL-10 RNA 
(corresponding to a frequency of 1:300,000) was spiked into the sample. The amount of 
the spiked IL-10 RNA was correctly determined, even though it represented an mcrease of 
only 10 - 20% above the intrinsic level. These results indicate that subtle changes in 
expression are sensitively determined by performing side-by-side experiments with 
identically prepared samples on identically synthesized arrays. 

Example 2 

T Cell Induction Experiments Measuring Cytokine mRNAs as a Function 

of Time Following Stimulation. 

The high density arrays of this invention were next used to monitor cytokine 
MRN A levels in murine T cells at different times following a biochemical stimulus. Cells 
from the murine T helper cell line (2D6) were treated with the phorbol ester 
4-phorbol-12-myristate 13-acetate (PMA) and a calcium ionophore. Poly (A)- MRNA was 
then isolated at 0, 2, 6 and 24 hours after stimulation. Isolated mRNA (approximately 1 
ug) was converted to labeled antisense RNA using a procedure that combines a 
double-stranded cDNA synthesis step with a subsequent in vitro transcription reaction. 
This RNA synthesis and labeling procedure amplifies the entire mRNA population by 20 
to 50-fold in an apparently unbiased and reproducible fashion (Table 2). 

The labeled antisense T-cell RNA from the four time points was then 
hybridized to DNA probe arrays for 2 and 22 hours. A large increase in the v-interferon 
mRNA level was observed, along with significant changes in four other cytokine mRNAs 
(IL-3, IL-10, GM-CSF and TNFcc). As shown in Figure 4, the cytokine messages were not 
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indu ced with identical kinetics. Changes in cytokine mRNA levels of less than 1:130,000 
were unambiguously detected along with the very large changes observed for v-interferon. 

These results highlight the value of the large experimental dynam,c range 
inherent in the method. The quantitative assessment of RNA levels from the hybridization 
results is direct, with no additional control hybridizations, sample manipulation, 
amplification, cloning or sequencing. The method is also efficient. Using current 
protocols, instrumentation and analysis software, a single user with a single scanner can 
read and analyze as many as 30 arrays in a day. 

Example 3 

Higher-Density Arrays Containing 65,000 Probes for Over 100 Murine 

Genes 

Figure 5 shows an array that contains over 65,000 different oligonucleotide 
probes (50 um feature size) following hybridization with an entire murine B cell RNA 
population. Arrays of this complexity were read at a resolution of 7.5 lim in less than 
fifteen minutes. The array contains probes for 1 18 genes including 12 munne genes 
represented on the simpler array described above, 35 U.S.C. § 102() additional murine 
genes, three bacterial genes and one phage gene. There are approximately 300 probe patrs 
per gene with the probes chosen using the selection rules described herein. The probes 
were chosen from the 600 bases of sequence at the 3' end of the translated region of each 
gene. A total of 21 murine RNAs were unambiguously detected in the B cell RNA 
population, at levels ranging from approximately 1:300,000 to 1:100. 

Labeled RNA samples from the T cell induction experiments (Fig. 4) were 
hybridized to these more complex 1 18-gene arrays, and similar results were obtained for 
the set of genes in common to both chip types. Expression changes were unambiguously 
observed for more than 20 other genes in addition to those shown in Figure 4. 

To determine whether much smaller sets of probes per gene are sufficient 
for reliable detection of RNAs, hybridization results from the 118 gene chip were analyzed 
using ten different subsets of 20 probe pairs per gene. That is to say, the data were 
analyzed as if the arrays contained only 20 probe pairs per gene. The ten subsets of 20 
pairs were chosen from the approximately 300 probe pairs per gene on the arrays. The 
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initial probe selection was made utilizing the probe selection and pruning algonthms 
described above. The ten subjects of 20 pairs were then randomly chosen from those 
probes that survived selection and pruning. Labeled RNAs were spiked into the munne B 
cell RNA population at levels of 1:25,000, 1:50,000 and 1:100,000. Changes in 
hybridization signals for the spiked RNAs were consistently detected at all three levels 
with the smaller probe sets. As expected, the hybridization intensities do not cluster as 
tightly as when averaging over larger numbers of probes. This analysis indicates that sets 
of 20 probe pairs per gene are sufficient for the measurement of expression changes at low 
levels but that improvements in probe selection and experimental procedures will are 
preferred to routinely detect RNAs at the very lowest levels with such small probe sets. 
Such improvements include, but are not limited to higher stringency hybridizations, 
coupled with use of slightly longer oligonucleotide probes (e.g.. 25 mer probes)) are in 
progress. 

Example 4 
Scale Up to Thousands of Genes 

A set of four high density arrays each containing 25-mer oligonucleotide 
probes approximately 1650 different human genes provided probes to a total of 6620 
genes. There were about 20 probes for each gene. The feature size on arrays was 50 
microns. This high density array was successfully hybridized to a cDNA library using 
essentially the protocols described above. Similar sets of high density arrays containing 
oligonucleotide probes to every known expressed sequence tag (EST) are in preparation. 

Example 5 

Direct Scale up for the Simultaneous Monitoring of Tens of Thousands 

ofRNAs. 

In addition to being sensitive, specific and quantitative, the approach 
described here is intrinsically parallel and readily scalable to the monitoring of very large 
numbers of mRNAs. The number of RNAs monitored can be increased greatly by 
decreasing the number of probes per RNA and increasing the number of probes per array. 
For example, using the above-described technology, arrays containing as many as 400,000 
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probes in an area of 1 .6 cm 2 (20 x 20 urn synthesis features) are currently synthesized and 
read. Using 20 probe pairs per gene allows 10,000 genes to be monitored on a single array 
while maintaining the important advantages of probe redundancy. A set of four such 
arrays could cover the more than 40,000 human genes for which there are expressed 
sequence tags (ESTS) in the public data bases, and new ESTs can be incorporated as they 
become available. Because of the combinatorial nature of the chemical synthesis, arrays of 
this complexity are made in the same amount of time with the same number of steps as the 
simpler ones used here. The use of even fewer probes per gene and arrays of higher 
density makes possible the simultaneous monitoring of all sequenced human genes on a 

single, or small number of small chips. 

The quantitative monitoring of expression levels for large numbers of genes 
will prove valuable in elucidating gene function, exploring the causes and mechanisms of 
disease, and for the discovery of potential therapeutic and diagnostic targets. As the body 
of genomic information grows, highly parallel methods of the type described here provide 
an efficient and direct way to use sequence information to help elucidate the underlying 
physiology of the cell. 

Example 6 
Probe Selection Using a Neural Net 

A neural net can be trained to predict the hybridization and cross 
hybridization intensities of a probe based on the sequence of bases in the probe, or on other 
probe properties. The neural net can then be used to pick an arbitrary number of the "best- 
probes. When a neural net was trained to do this it produced a moderate (0.7) correlation 
between predicted intensity and measured intensity, with a better model for cross 
hybridization than hybridization. 

A) Input/Output Mapping. 

The neural net was trained to identify the hybridization properties of 20-mer 
probes. The 20-mer probes were mapped to an eighty "bit long input vector, with the first 
four bits representing the base in the first position of the probe, the next four bits 
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repenting fte base in *e second position, «. Thus, ,he four bases were encoded as 

follows: 

A: 1000 C: 0100 G: 0010 T: 0001 

The neural network produced two outputs; hybridization intensity, and 

crosshybridization intensity. The output was scaled linearly so that 95% of the outputs 

from the actual experiments fell in the range 0. to 1 . 



B) Neural Net Architecture. 

The neural net was a backpropagation network with 80 input neurons, one 
hidden layer of 20 neurons, and an output layer of two neurons. A sigmoid transfer 
function was used: ( s(x) = 1/(1+ exp(-l * *)) ) that scales the input values from 0 to 1 m a 
non-linear (sigmoid) manner. 



C) Neural Net Training. 



from Neural Works 



The network was trained using the default parameters 
Professional 2.5 for a backprop network. (Neural Works Professional is a product of 
NeuralWare, Pittsburgh Pennsylvania, USA). The training set consisted of approximately 
8000 examples of probes, and the associated hybridization and crosshybridizanon 
intensities. 



D) Neural Net Weights. 

Neural net weights are provided in two matrices; an 81 x 20 matrix (Table 

3) (weights_l) and a 2 x 20 matrix Table 4 (weights_2). 

Table 3. Neural net weights (81 x 20 matrix) (weights_l). 



-0.0316746 

0.19370709 

0.02240546 

0.16692482 

0.02129388 

0.03684745 

0.00603615 

0.11111762 

0.01354388 



-0.0263491 

-0.0515666 

0.08460676 

-0.0913482 

0.12105247 

-0.0714359 

0.04986877 

0.12571541 

0.1131407 



0.15907079 

0.06444275 

0.14313674 

0.05571244 

0.1405973 

0.02903421 

0.02134438 

0.09278143 

0.06123798 



-0.0353881 

-0.0480836 

0.06798329 

0.22345543 

-0.0066357 

0.09420238 

0.0852259 

0.11373715 

0.14818664 



-0.0529314 

0.29237783 

0.06746746 

0.04707823 

-0.0760119 

0.128395.44 

0.13453935 

0.03250757 

0.07090721 



0.09014647 

-0.034054 

0.033717 

-0.0035547 

0.11165894 

0.08542864 

0.03089394 

-0.0460193 

0.05089445 



-0.0635492 -0.0227965 

0 18790121 0.09624594 

0.02378313 0.10295142 

-0 0403537 0.23566079 

-0.0694051 -0.0637478 

-0 0731941 0.08858298 

0.06500423 0.11003297 

0 03036973 0.06836637 

0-1 7097448 -0.007098 

0 05442215 0.23686385 

0.08683836 0.14047802 

0 08829379 0.17881326 

0.0749867 0.08564588 

0.05022619 0.14544216 

0.08078995 -0.0022168 

0 03405219 0.06140256 

0.11517255 0.17431773 
0 14236264 0.17182963 
-0.0866363 0.11008894 

-0.0163019 0.06256609 

0.38030735 0.28241798 

0.23144296 -0.3207987 

0.46158856 0.20649959 

0.45084599 -0.5829023 

0.55080342 0.30968052 

0.36848074 -0.5196409 

0.47133151 0.30909833 

0.46017882 -0.5331213 

0.33042327 0.4072904 

0.19591335 -0.4028497 

0.19672842 0.16133355 

0.1710967 -0.2728708 

0.03326527 0.22045346 

-0.0752053 -0.0571054 

-0.0838031 0.01667063 

-0.2039919 -0.0532526 

-0.0482095 0.043 1 6666 

-0.0065265 -0.2011867 

0.09420983 -0.0010159 

0.00565713 -0.1990354 

0.11275655 0.01772332 

-0.0850152 -0.1931012 

0.1065109 0.07205399 

-0.0922655 -0.1478272 



0 1081195 0.13419148 
-0.0865264 -0.0126238 
0.05553147 -0.0193289 
0.10335726 0.07325625 
0.2687766= 

0.39719725 -0.0709359 
0.0403917 0.02953459 
0 02345118 0.0206452 
-0.0348659 0.09989586 
0.01979881 -9.8OE-06 
0.00982503 0.11756061 
0.12465772 0.13134554 
0.05334799 0.14341639 
0.03519877 0.12799838 
0.05439407 -0.0789278 
0.01802093 0.0954654 
0.09664405 0.01782892 
0.02306779 -0.0489743 
0.40543473= 

0 16058824 0.14149499 
0.2882407 -0.2227429 
0.56366867 0.35976714 
0 35099933 -0.5071837 
0.51297456 0.33494622 
0.54485208 -0.7155912 
0.33829662 0.21612473 
0.37790757 -0.464661 
0.60684419 0.47586009 
0.24270254 -0.3750777 
0.30585453 0.35896543 
0.21780767 -0.2419563 
0.1234024 0.06987085 
0.98782647= 

-0.1834571 0.14263187 
-0.0945634 -0.1137057 
-0.0828366 0.1373803 
-0 1732933 0.0550463 
-0.0434558 -0.0369132 
-0.1768979 -0.2365085 
0.11568499 -0.0690084 
-0.0016695 -0.249011 
0.08498721 0.03673514 
-0.1304159 -0.1723315 
0.08858409 0.14206541 
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0.08916269 -0.010634 
0.11497019 -0.0057307 
-0.0627925 -0.024633 
0.11329328 0.2555581 



0.14039235 0.23244983 

0.26901209 -0.0605089 

-0.0079707 0.20967795 

0.07417496 -0.1236805 

-0.0549301 0.08891765 

0.09054346 -0.028868 

0.09500015 0.04572553 

0.11468539 0.14277624 

0.01427337 0.16172577 

0.07312368 0.11417327 

0.00130152 -0.035995 

0.03840308 0.05180788 

-0.0006051 0.19077648 



0.15698175 -0.1197781 

0.34799534 0.38490915 

0.20325871 -0.343972 

0.56459975 0.21605791 

0.43086055 -0.5538613 

0.30799151 0.29871368 

0.41646513 -0.5573701 

0.50172138 0.21158406 

0.28597337 -0.3345993 

0.14083703 0.30998308 

0.24851802 -0.2937264 

0.17847325 0.07593013 

0.1741322 0.05922241 



-0.0715346 -0.0524248 

-0.1040308 0.04263301 

-0.0562212 -0.2127942 

-0.0526818 0.06739104 

-0.0196296 -0.1314755 

-0.0150508 0.14120786 

-0.1509431 -0.0575663 

0.09066539 0.05357879 

-0.1446398 -0.199778 

0.09151162 0.05596334 

-0.0314846 -0.1985286 
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0.19862956 
-0.0721622 
0.203441 14 

0.02848385 
-0.0742345 
-0.0747124 
-0.1090026 
0.07326921 
-0.0586419 
0.05150735 
-0.0001517 
0.12470152 
-0.049451 
0.18880892 
0.0234996 
0.16712189 
-0.0487184 

0.146753 
-0.0739603 
-0.1548294 
-0.1728483 
-0.0950026 
0.07691807 
-0.035349 
0.04664604 
0.00194441 
0.13777457 
-0.0514387 
0.0994828 
-0.1571046 
0.13411179 

-0.0304715 
0.13328855 
0.0217719 
0.04117605 
0.08965616 
0.08670026 
0.00865917 
-0.1322389 
0.1623435 
-0.0887844 
0.08622501 
-0.0290916 
0.17453311 



-0.0502828 
-0.1506944 
-0.061502 

0.00254791 
-0.0545447 
0.13325705 
-0.0988943 
0.02654305 
-0.08015 
-0.1449667 
-0.0521925 
-0.3589714 
0.05717351 
-0.3259364 
-0.1177034 
-0.0122822 
0.01467591 

-0.0931665 
0.17018235 
-0.0908961 
0.12621336 
-0.1562225 
0.13016214 
-0.302975 
0.08887579 
-0.1631221 
0.00339417 
-0.0722146 
-0.035077 
-0.1713289 
-0.0159559 

-0.0845574 
-0.1492282 
-0.3102229 
0.03997391 
-0.1572192 
0.03785197 
-0.2995701 
0.21433547 
-0.3362183 
0.07691832 
-0.2421202 
-0.0839412 
-0.1529943 



-0.11447 
0.14910588 
-0.1647823= 

-0.0646306 
-0.1119258 
-0.0508435 
-0.0445145 
-0.1239398 
-0.0073617 
0.06144469 
0.21106339 
-0.0061972 
0.14784867 
0.04754021 
0.02549919 
-0.109654 
-0.0759871= 

-0.1475015 
-0.0636651 
-0.0415557 
-0.1321529 
-0.0917397 
0.10801306 
0.03706082 
-0.0210248 
0.11259725 
-0.2007502 
0.07706029 
-0.106266 
0.14155054 
-0.1296399= 

0.17682472 
0.11350834 
0.18922243 
0.06022124 
0.00942572 
0.21052985 
-0.0835971 
0.08046963 
-0.1335399 
0.11459036 
0.00845924 
0.10590381 
0.02726452 



-0.1440073 
0.03297219 



0.02634032 
0.10765317 
-0.1761459 
0.03802977 
0.03043288 
-0.1682889 
0.1005446 
-0.4393073 
0.07370338 
-0.3082401 
-0.0576587 
-0.1671077 
-0.0327367 



0.07284982 

0.04693379 

0.04915113 

-0.1091831 

0.18711324 

-0.3151104 

0.12322487 

-0.1427284 

-0.0984519 

-0.0703103 

0.04593663 

-0.059766 

0.00283311 



-0.0552084 

-0.1121938 

-0.0940011 

-0.1808036 

0.07957069 

-0.3564453 

0.14536868 

-0.1548838 

0.10284293 

-0.056257 

-0.0151014 

-0.1593935 

0.06178628 



0.01366408 
-0.0266356 



-0.0654473 

-0.0606677 

-0.0883804 

-0.0484086 

0.09781751 

0.00400978 

0.22570252 

0.0053312 

0.25447422 

0.01207511 

0.02376083 

0.00582423 

0.01481733 



-0.0609536 

-0.2586751 

-0.0436857 

-0.0989133 

0.04599057 

0.0105284 

0.07198878 

0.09078772 

-0.0939511 

0.1548807 

-0.2334163 

0.13616422 

0.01067419 



0.07044557 

0.02089526 

0.08787836 

0.04742034 

0.12980177 

0.01492627 

0.08446889 

-0.021533 

0.16658102 

0.01970494 

0.19088623 

-0.0399097 

0.06624542 



0.11101657 
-0.2501774 



0.04731949 

0.05693235 

-0.0777852 

-0.0337959 

0.02590732 

0.01282504 

-0.3763289 

0.13283829 

-0.3289591 

-0.1141143 

-0.2828108 

-0.0715723 

-0.0636454 



-0.0945313 

0.15550844 

-0.031472 

0.0294641 

-0.2039073 

0.10938062 

-0.2535323 

0.08646259 

-0.218395 

0.13540466 

-0.0250262 

0.22308858 

-0.360891 



-0.1482136 

0.00104415 

-0.1835242 

-0.0744867 

-0.2440033 

0.04286519 

-0.1689682 

0.0558197 

-0.3004514 

0.08940192 

-0.1967196 

-0.0861852 

0.01004315 
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-0.158326 

0.11429903 
0.33529782 
0.09062219 
0.49959001 
0.52447188 
0.78773916 
0.47296903 
0.80210346 
0.45408139 
0.56882453 
0.4490836 
0.39600489 
0.35209069 
0.13266218 

-0.0112394 
0.09505147 
0.16811867 
-0.0380792 
0.42699403 
-0.0381787 
0.74187845 
0.02410492 
0.7597335 
0.15992545 
0.25537059 
-0.1888455 
0.08058657 
-0.2839164 

-0.1147067 
-0.5139894 
0.19038832 
-0.9312411 
-0.5433689 
-0.9422795 
-0.6896155 
-1.0231192 
-0.6568274 
-0.7811472 
-0.4704399 
-0.7735854 
-0.1544528 
-0.4815812 



-0.0149114 

-0.0432327 
0.24581231 
-0.2974442 
0.22195752 
-0.5555881 
0.45518181 
-0.672706 
0.40167108 
-0.7316507 
0.29653791 
-0.4754149 
0.24787127 
-0.203685 
0.20236486 

0.01601524 
-0.0220034 
-0.4498019 
-0.0468904 
-0.6348544 
0.09532065 
-0.8996705 
-0.0632124 
-0.6287012 
-0.1780757 
-0.4526066 
0.1974159 
-0.0768841 
0.12684187 

-0.0084124 
-0.6221746 
0.55414283 
-0.410718 
0.92539561 
-0.6914638 
1.1251011 
-0.5556009 
1.1967098 
-0.5740913 
0.51728982 
-0.3031097 
0.2042688 
-0.5319371 



-0.1479269= 

0.14520219 
0.07311282 
0.46336258 
0.32254469 
0.68481833 
0.71273196 
0.69020337 
0.50383294 
0.48975253 
0.4472059 
0.46366793 
0.20359448 
0.25115264 
1.1078833= 

0.11363719 
0.0714381 
0.10313182 
0.37975076 
0.00025528 
0.50065184 
0.03180836 
0.73732454 
0.03615654 
0.3820785 
-0.0761788 
0.01620384 
-0.316401 
-0.2450078= 

-0.5239977 
-0.3979228 
-1.1652025 
-0.1498093 
-0.9013531 
-0.7839714 
-0.8161536 
-0.7499282 
-1.150661 
-0.4527726 
-0.545236 
-0.4083092 
-0.8989772 
-1.3798244= 



0.51860482 
-0.2268714 
0.17145836 
-0.4994924 
0.20251468 
-0.7655811 
0.37193877 
-0.6195157 
0.47984859 
-0.5177853 
0.31378582 
-0.203447 
0.21313109 



-0.1440069 

-0.1994763 

-0.0149997 

-0.7120748 

0.06202703 

-0.7413587 

0.04010354 

-0.8188882 

-0.1248241 

-0.5642462 

-0.0242514 

-0.1306533 

0.09779498 



-0.5021591 

0.30136263 

-0.3686967 

0.55332947 

-0.6145319 

1.4393494 

-0.8204682 

1.281976 

-0.5503616 

0.64911795 

-0.8311051 

-0.0152683 

-0.3088974 



0.19151463 

0.31717882 

0.32802406 

0.75497276 

0.39860719 

0.7155844 

0.47959387 

0.80366057 

0.33738744 

0.36228263 

0.48470935 

0.25734761 

0.12461348 



0.05522444 

0.12304886 

0.47659361 

-0.1078557 

0.57867163 

-0.0193744 

0.82366729 

0.04538922 

0.56647652 

-0.0609947 

0.35473567 

-0.1468564 

0.08537519 



0.02636886 

-0.742976 

-0.4750175 

-1.0870041 

-0.5512772 

-0.7092296 

-0.8957642 

-0.9347371 

-0.6640182 

-0.6970047 

-0.4240301 

-0.2330873 
-0.2014994 



-0.1127352 

0.35736522 

-0.3898261 

0.35112098 

-0.7198414 

0.39701831 

-0.9032337 

0.3884458 

-0.5510914 

0.40129057 

-0.2453159 

0.17168433 

0.10632347 



-0.0711868 

-0.1611445 

-0.4639786 

0.10635795 

-0.6733171 

-0.1180785 

-0.6429569 

-0.1471086 

-0.6294683 

-0.0350918 

-0.3512402 

0.25235301 

-0.0738487 



0.1470097 

-0.4011821 

0.54713631 

-0.4378341 

1.0310978 

-0.894987 

1.3315079 

-0.6562014 

0.84698498 

-0.5759697 

0.37167478 

-0.5839304 

0.11505035 
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0.07143499 
0.1549352 
0.44703272 
0.2595928 
0.53066176 
0.1702383 
0.5403164 
-0.092208 
0.22238699 
-0.180493 
0.17421109 
-0.1982318 
0.05979542 
-0.1978694 

0.06230025 
0.1073643 
-0.0272076 
0.33091745 
0.01069087 
0.11231339 
-0.0213237 
0.12980145 
0.15833771 
-0.0696303 
0.05538817 
-0.0677462 
0.14652038 
-0.1929855 
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-0.1589592 
-0.0608833 
-0.6194252 
-0.119705 
-0.9705743 
0.02221953 
-0.5077381 
0.21902563 
-0.156256 
0.17164391 
-0.0730809 
0.06996673 
-0.0623277 
0.05119598 

-0.0752745 
-0.090154 
-0.1014201 
-0.0610701 
0.02569587 
-0.0392407 
-0.0261696 
-0.038394 
0.01835199 
0.03802699 
0.01067943 
-0.0772208 
0.06084725 
0.00694158 



0.04816094 
0.21059546 
0.19459446 
0.4913742 
0.1324198 
0.44412452 
0.00849557 
0.25788471 
-0.2092034 
0.15690604 
-0.3717274 
0.19735655 
-0.2521037 
-0.2067173= 



-0.0301291 

-0.4705076 

-0.0523894 

-0.8455008 

0.08982921 

-0.7700244 

0.1611405 

-0.3861519 

0.16458821 

-0.0254563 

0.1436436 

0.05625506 

0.0944353 



0.15144217 

0.16360784 

0.31194624 

0.15694356 

0.43900672 

0.10496679 

0.31764683 

-0.2022993 

0.20111787 

-0.1990184 

-0.0215865 

-0.241524 

-0.0492548 



-0.3037405 

-0.0684895 

-0.8030509 

-0.0023983 

-0.8588745 

0.14137991 

-0.5240273 

0.13711917 

-0.1418906 

0.10211211 

-0.2363243 

0.12768924 

0.05238663 



0.32974288 

-0.0938452 

0.19723812 

0.01335303 

0.11676744 

0.06117272 

0.09474246 

0.08167668 

0.04420554 

0.0806741 

0.04131892 

0.16641215 

-0.1150111 

0.26604816= 



0.00985043 

0.00704324 

-0.0935401 

0.02156818 

-0.0213131 

-0.0234323 

-0.0100756 

-0.0105376 

0.02605363 

0.03993953 

-0.0267609 

0.09142463 

-0.0687876 



0.07881941 

0.2569764 

0.0913924 

0.21619918 

0.1322203 

0.14693312 

0.10580003 

0.02142166 

0.27427858 

-0.0121658 

0.14418064 

0.02115551 

0.10878915 



-0.0835249 

0.08700065 

-0.0728388 

-0.0909865 

0.11848255 

0.13509636 

-0.0147534 

-0.0161705 

0.05774866 

0.07568218 

0.0897231 

-0.0876383 

0.32776353 



-0.0786668 

-0.0029815 

-0.1259448 

-0.1091328 

0.00333312 

0.14768517 

0.0611263 

0.09951859 

0.05554885 

0.01806534 

0.10942505 

-0.0365961 

0.01934035 

-0.0525739 



0.05454836 

-0.0837616 

-0.0845026 

0.0090488 

-0.2812204 

0.02989549 

-0.1895157 

0.14843601 

-0.3743193 

0.09599103 

-0.0473638 

-0.0962418 

-0.0073082 

0.06086259 



-0.0834711 

0.02468397 

0.10171869 

0.06142418 

0.02039073 

0.09454407 

0.08583955 

0.12351749 

-0.0205463 

-0.0570596 

0.01151769 

0.01007566 

-0.0489736 

-0.1788069= 



0.07707115 

0.03531792 

-0.0541042 

-0.167912 

-0.052828 

-0.1860176 

0.09382812 

-0.1327625 

0.12675567 

-0.1523381 

0.09737793 

-0.0049753 

0.10457312 



0.05659099 

-0.1437671 

0.05257236 

-0.098868 

-0.0439769 

-0.0505908 

-0.0001466 

0.10949049 

0.0775801 

0.08384241 

0.07082167 

0.01404589 

-0.0520154 



-0.0285798 

0.10122854 

0.04065102 

0.02574896 

-0.0458286 

0.088718 

-0.4065202 

0.07129322 

-0.1869074 

0.00704122 

-0.2184597 

-0.0406134 

-0.0454775 



45 



MS £E£ = 53SS! = 
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0 39202845 -0.6033413 

0 65535748 0.32430753 

0.95144385 -1.2075449 

0 99852085 0.48870567 

1 2572207 -1.5854638 
0.73526824 0.31977594 
0 95438999 -1.2543333 
0 45917389 0.27823627 
0.33946255 -0.5412283 
0.19083619 0.37056214 
0 30190364 -0.3655235 
0.18584418 0.34009755 

0 13698889 -0.0798945 

0 31540671 0.08274947 

0.00119518 -0.1978176 

0.74023747 0.38564634 

0.06987014 -0.5168169 

0.7840901 0.4372991 

0 05702339 -0.5161278 

0.70519674 0.15731441 

0.11747536 -0.612968 

0.81293154 0.18651071 

0 24770954 -0.4320194 

0 54755467 0.08819038 

0.03049339 -0.1913544 

0.1008145 0.01412579 

-0.0048454 0.1204864 

-0.0273505 0.10494121 

0 1325469 0.15324508 

-0.0007111 0.13285491 

-0 118853 0.26435438 

0.07947435 0.07329605 

-0.162177 0.18712705 

0.04106503 0.08498254 

-0.0012895 0.2371086 

0.13412228 0.10756335 

-0.142963 0.09792294 

0 19903891 0.02989559 

-0.0027455 0.16604523 

0.25000233 0.05931267 



0.04679342 0.10158926 
-0.1704439 0.302394 
-0.215752 0.32740423 



0 57940209 -0.0460919 
0 64831889 -1.0950515 

0 94851351 -0.0852669 

1 7470727 -1.7586045 

0 89351815 0.39586932 

1 2270083 -1.2818555 
0.55854511 0.1672449 
0.26928344 -0.9804664 
0 1085042 0.44658452 
0.24114503 -0.3020035 
0.33355939 0.44246852 
4.5490937= 

0.3366704 0.17313539 
0.11212139 -0.428847 
0.59532708 -0.0309942 
0 03748908 -0.6475483 
1.0081589 -0.0517421 
0.13783893 -0.8574924 
0.66693234 -0.0496743 
0 08724558 -0.7325026 
0.98160452 0.02407174 
0.03182137 -0.7051651 
0 72470272 0.12951751 
0 22105552 -0.3489864 
0.4782092 -0.098419 
0.42727205= 

0.15507312 0.25648347 
0 1988914 0.09454013 
-0 01398 0.08281901 
-0.1658676 0.25348473 
-0.0775707 0.09143513 
-0.0903666 0.10754076 
0 03216886 0.04698242 
-0.0325038 0.29328787 
0 14713244 -0.053306 
-0.0486093 0.05799349 
0.06907349 0.05942665 
0.15750381 -0.0373194 
0.06245366 -0.0775013 
0.22881882= 



-0.122116 0.23491009 
-0.0671487 0.33251444 
-0.1597161 0.18950906 



0 53419203 -0.7680888 

0.80829531 0.05049393 

0.94320357 -1.680338 

0.56886804 0.66196042 

1.586942 -1.6365775 

0.71813524 0.37488377 

0.56084049 -0.7980669 

0.62299174 0.53984308 

0.39120093 -0.5676367 

0.39015424 0.09788869 

0 17172456 -0.3479928 



0.01228174 -0.2679709 

0.57447821 -0.0305296 

-0.0107875 -0.7312108 

0.87958473 0.05327692 

0.08651814 -0.761238 

0.90612286 0.06334394 

0.07689167 -0.5775976 

0.65517086 0.29064488 

0.02613025 -0.677594 

0.89682412 0.181806 
0.14626819 -0.3964331 
0.4620938 0.06516677 
-0.0160188 0.07177288 



0.03982652 0.14641231 

-0.0560908 0.07466536 

0.07909692 0.36858437 

0.08835109 0.16466415 

-0.1019902 0.29236633 

0.04456592 0.18368921 

-0.0385783 0.2276271 

0.01249749 0.10016124 

-0.0808243 0.28909287 

0.21323961 -0.0118695 

-0.143813 0.21673524 
0.12471988 0.10462648 
-0.0160873 0.21550164 



-0.0625733 0.19985424 
-0.0581705 0.21095584 
-0.1232446 0.27883759 



-0.0430407 0.04886867 

-0 1322077 0.2981362 

0.10109599 0.23081669 

-0.0808031 0.15750171 

0.13912162 0.04256131 

-0.2270383 0.22945035 

0.01596376 0.03504543 

-0 1284984 0.24145114 

0.00538179 0.05302088 

-0.0861699 0.05814215 

0.20031671 0.23140682 

0 37838998 0.00934576 

-0.038453 0.24550894 

0.58336282 -0.2145292 

0.07741276 0.45081589 

0.85640681 -0.6068144 

-0.0642752 0.37914035 

0.79736245 -0.7102081 

0 02592243 0.37013471 

0.88004726 -0.6990998 

0.30492786 0.39735735 

0 54989374 -0.5660355 

0 17151839 0.39539635 

0.51068121 -0.3502096 

-0.1255455 0.35898197 

0.02952595 -0.0751979 

-0.6262965 -0.1423945 

0.02978903 0.20563391 

-0.7473708 -0.0415357 

0.00290797 0.6284017 

-1.0829539 -0.1822221 

0.06966544 0.75524592 

-0.8823278 -0.3404879 

0.0915129 0.44590429 

-0.499517 -0.4873153 

-0.1106236 0.27437851 

-0.6255118 -0.1046614 

-0.1468192 -0.1719856 

-0.213571 -0.1335077 

0.06424081 -0.0978306 

-0.1032737 0.11563963 

0.05533361 -0.033985 

0.05850215 0.03830531 

-0.012636 -0.1925185 



-0 0914212 0.28192514 

0.1254565 0.15627012 

-0.1617257 0.29508773 

0 08072432 0.12990661 

-0 1625126 0.25232118 

0 18167619 0.00080986 

0.00964208 0.11757879 

0 20540115 0.07580803 

-0.1001294 0.27505419 

0.21307872 0.01372274 
0.16010799= 

-0.139213 0.29823828 

0.30729383 -0.2807365 

-0.2378269 0.25939462 

0.65251595 -0.4543131 

-0.1187844 0.35959438 

0.71409059 -0.7180941 

0 14268413 0.41374633 

0.82774776 -0.8136597 

0.23456772 0.24596012 

0 55497372 -0.6593497 
0.1205707 0.22377795 
0 50465524 -0.3791285 
-0.2094818 0.31471297 
0.79502285= 

-0.2556099 -0.3040917 

-0.0537339 0.11189342 

-0.5457558 -0.3666513 

0.18283925 0.28153449 

-0.6397845 -0.5606785 

-0.1832336 0.49371469 

-0.9053063 -0.5826979 

-0.0334436 0.50130409 

-0.7808504 -0.4399623 

-0.2889721 0.47303999 

-0.6061368 -0.4166524 

-0.2710638 0.26425925 

-0.4140109 -0.1058299 
-0.7155944 

-0.1169782 0.13909493 

-0.0709175 -0.028875 

-0.049436 0.11520655 

-0.0893732 -0.0066427 
0.13028348 -0.0045112 



0.05275658 0.21014904 

0.04116358 0.08507752 

-0.0405337 -0.0497829 

-0.1935954 0.29120663 

0.04736055 -0.0530935 

-0.1253632 0.15695702 

-0.0230768 0.04350457 

-0.0932236 0.14288881 

0.22654785 0.02395938 

0.04515802 -0.0269269 



0.40640026 -0.067578 

-0.0689575 0.26537073 

0.64761585 -0.3581158 

-0.0671543 0.48592216 

0.71842372 -0.7140775 

0.21169594 0.27888221 

0.75569016 -0.7394939 

0.24068722 0.45081198 

0.67229778 -0.8148533 

0.20656242 0.3752968 

0.46045718 -0.519361 

0.07184427 0.36315975 

0.18174268 -0.1241962 



-0.0942183 -0.0541431 

-0.3791296 -0.3382006 

-0.1922515 0.29512301 

-0.7847292 -0.2313099 

-0.1479581 0.57049137 

-0.6362705 -0.2790937 

-0.114608 0.90401584 

-0.57275 -0.3842527 

-0.1189605 0.59226018 

-0.4015501 -0.2875251 

-0.0637606 0.33875695 

-0.4123208 -0.2157291 

0.02873472 -0.1210428 



-0.0838893 -0.1300299 

-0.1718238 -0.026291 

-0.0279296 -0.0170352 

0.06969514 0.13403182 

0.05260766 -0.2759708 



-0.0395793 

-0.0917266 

0.23327024 

0.10926479 

0.12219627 

-0.1091286 

-0.0210903 

-0.1233738 

0.06584878 



0.03069885 

-0.2185763 

-0.0898143 

-0.1167006 

0.05705986 

-0.075133 

0.11607172 

-0.0760847 

-0.0323083 



0.07913893 

0.04743406 

-0.0578982 

0.18223672 

-0.0505442 

0.02949276 

-0.0943146 

0.00098273 

-0.0581293= 



-0.1470363 

-0.0364127 

-0.2096201 

0.09710353 

-0.1334345 

-0.0217044 

-0.1014408 

0.07522969 



0.09080192 

0.00991712 

0.09257686 

0.03838636 

-0.0204458 

-0.0782921 

0.02903902 

0.05794976 



0.19741131 

-0.2093729 

0.00566842 

-0.2026017 

0.01167099 

-0.1160332 

0.02963065 

-0.1959872 



Table 4. Second neural net weighting matrix (2x21) (weights_2). 



-0.5675537 
-0.5328685 
-0.209518 
2.0453076 

0.55343837 
0.12712023 
0.19891988 
3.1242371 



-0.6119734 
0.31165671 
1.6362301 
0.08412334 

0.68506879 
-1.7462951 
-4.0000067 
0.22860088 



0.20069507 
-0.9999997 
-1.9999975 
-0.1645829= 

-1.1869608 
0.0818732 
-0.5605077 
1.6726165= 



0.26132998 
-0.4128213 
-0.2563241 



0.39551663 
6.111361 . 
1.3601962 



-0.5071653 
-1.0000007 
0.04389827 



0.38050765 
0.62210494 
1.7318885 



0.2793434 

-0.6456627 

1.7597554 



0.40832204 
0.42921746 
-1.0558798 



A) rnde fo r mffliriPff the net 

Code for running the neural net is provided below in Table 5 (neural_n.c) 
and Table 6 (lin_alg.c). 

TableS. Code for running the neural net (neural_n.c). 



#define local far 
#include <windows.h> 
#include <alloc.h> 
#include "utils.h" 
#include <string.h> 
#include <ctype.h> 
#include <stdio.h> 
#include <math.h> 
#include <mem.h> 
#include "des_util.h" 
#include "chipwin.h" 
#include "lin_alg.h" 

void reportProblem( char local * message, short errorClass); 
char iniFileNameQ = "designer.ini"; 
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static void sigmoid( vector local * transformMe ){ 
short i; . 

for( i = o; i < trar^fonr^e^size; i ) + transformMe^valuesW)); 

transformMe->values[i] - 1/(1+ exp^ i 

} 

static short getNumCols(char far * buffer){ 
short count = 1; 
{or{ ;*buffer != 0; buffer++ ) 

if( *buffer = '\f ) count++; 
return count; 

} 

static short g etNumRows(char far * buffer){ 
15 char far * last, far * current; 

short count = -1; 
current = buffer; 

!;■? do{ 
W count++; 
O . last = current; 

J;j ZU current = strchr( last+1, 0 ); 

}while( current > last+1 ); 
return count; 

} 

static void readMatrix( matrix local * theMat, char far * buffer ){ 
short ij; 
char far * temp; 
temp = buffer; 
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forf i = 0; i < theMat->numRows; i++ ) { 

forfi = 0; i < theMat->numCols; j++ ){ 1S . .... ♦._-++. 

WW isice( *temp ) || (*tem P = 0 && •(« <•=<>))- temp++, 
sscanf( temp, "%f\ &theMat->values[i]Dl)» 
while( !isspace( *temp ) && *temp != 0) temp++; 

} 

} 



} 



40 #define MaxNumLines (20) 
#define MaxLineSize (1024) 



short readNeuralNetWeights(matrix local *weightsl, matrix local *weights2 



45 char far * buffer; 

int copiedLength; 
•short numCols, numRows; 
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buffer = farcalloc( MaxNumLines * MaxLineSize, sizeof( char ) y 

if Suffer = NULL ){ errorHwnd( "failed to allocate file reading - buffer ), return 

^CopiedLength - GetPrivateProfileStringCVeightsJ », NULL, "\0\0", buffer, 
5 N**^ (MaxNumLines^MaxLineSize^ 
" 10)){ errorHwnd("failed to read .ini file"); return FALSE; 

10 numCols = getNumCols( buffer); 

numRows = getNumRows( buffer ); 

tf( !allocateMatrix( weightsl, numRows, numCols )) return FALSE, 
readMatrix( weightsl , buffer ); 
, 5 copiedLengu, = GeffrivateProfileStrtasCweigh^r, NULL, »\0X0", buffer, 

^t^^^^^ (MaxNumLines * MaxLineSize 
1 °^ * errorHwnd("failed to read -ini file"); 



03 
CO 



j'lo farfree( buffer ); 

return FALSE; 
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Hi 

hi 



numCols = getNumCols( buffer ); 

^SZ^SSS^o^. numCo.s ,)( far,** buffer ); rerun, 



FALSE; } ^ N 

readMatrix( weights2, buffer ); 

farfree( buffer ); 

return TRUE; 



30 } 

short runForwardC vector ioca. -input, vector wei ^, ^ x local 

*weights2){ 

(shortXweightsl-numRows + 1) )) return 

FALSE if( ! vectorTimesMatrix( input, &hiddenLayer, weightsl ) ){ 
freeVector( &hiddenLayer ); return FALSE; 

40 } 

sigmoid( &hiddenLayer ); 
hiddenLayer.values[hiddenLayer.size-l]- 1; 
if( !vectorTimesMatrix( &hiddenLayer, output, weights2 ) ){ 
freeVector( &hiddenLayer ); return FALSE; 

45 > 

freeVector( &hiddenLayer ); 

sigmoid( output ); 
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return TRUE; 

Lie veetor inputVector= (NULL, °]^?%^^ ° h «** 
firstWeights = {NULL, 0, 0} , secondWeights - {NULL, 0, 0}, 

static short beenHereDoneThis = FALSE; 

static short makeSureNetIsSetUp( void ){ 

5 'gS^iSSw^ & secondWei g hts „ > = FALSE; 
f i JlocateVectorC &inputVector, firstWeights.numCols )) return - FALSE 
£ iallocSvectorC &outputVector, secondWeights.numRows )) return = FALSE, 

beenHereDoneThis = TRUE; 
return TRUE; 

} 

void removeNetFromMemory( void ) { 

freeVector( &inputVector ); freeVector( &outputVector ) 
freeMatrix( &firstWeights ); freeMatrix( AsecondWeights ); 
beenHereDoneThis = FALSE; 

} 

short nnEstimateHybAndXHyb( float local * hyb, float local * xHyb, char = local * 
probe) { 

short probeLength, i; 

if( ImakeSureNetlsSetUpO) return FALSE; 
probeLength = (short)(strlen( probe )); 
iff (probeLength *4 + 1) != inputVector.size ){ 
// reportProblemC'Neural net not set up to deal with probes of this = length , 0), 

if( (probeLength *4 + 1) > inputVector.size ){ 
// reportProblem( "probe being trimmed to do annlysis , 1 ), 

probeLength = (short)(inputVector.size / 4); 

} 

Lemset( inputVector.values, 0, inputVector.size * sizeof( float)); 
inputVector.values[inputVector.size-l] - 1; 

for( i = 0; i < probeLength; i++ ) , , n , . 

inputVector.values[i * 4 + lookupIndex( tolower(probe[i] ))]- 1 , 
runForward( &inputVector, &outputVector, &firstWeights, &secondWeights); 
*hyb = outputVector.values[0]; 
*xHyb = outputVector.values[l]; 
return TRUE; 
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Table 6. Code for running the neural net (lin.algx). 



lin_algx 
5 #include "utils.h" 

#include "lin_alg.h" 
#include <alloc.h> 

short alloca,eMa«rix( matrix local • short rows, shot, columns)! 

FALSE;} 

for( 1 - - -iw — si - f < float > >• 

P iff theMat->values[i]= NULL ){ 

errorHwnd ("failed to allocate matrix ); 
for( -i;i>=0;i--) 

free( theMat->values[i] ); 

ii^rw return FALSE; 

ru } 

LMa,->numRow S - tows; theMat->numCols - columns; 
,, 25 , shot, aTaS vector .oca, • utcVec^ot, columns), 

FALSE;} 

theVec->size = columns; 

30 " return TRUE; 

toid freeVector( vector local * theVec ){ 
free( theVec->values ); 
theVec->values = NULL; 
35 theVec->size = 0; 

} 

void freeMatrix( matrix local * theMat){ 
short i; 

40 for( i = 0; i < theMat->numRows; i++ ) 

free( theMat->values[i] ); 
free( theMat->values ); 
theMat->values = NULL; ^ i . 

theMat->numRows = theMat->numCols - U; 

45 } 



O 

ru 
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float vDot( float local * inputl, float local * input2 ? short size ){ 
float returnValue = 0; 
short i; 

for( i = 0; i < size; i++) 

returnValue += inputl [i] * input2[i]; 

return returnValue; 

} 

short vectorTimesMatrix( vector local "input, vector local "output, 

matrix local *mat ){ 

ifOmput->size != mat->numCols) || (output->size < mat->numRows) ){ 
errorHwnd( "illegal multiply" ); 
return FALSE; 

i for( i = 0; i < mat->numRows; i++ ) . , ^ • _ 

1 output->values[i] = vDot( input->values, mat->values[i], input->size - 



); 

;:120 } 



return TRUE; 



Example 7 
Generic Difference Screening 

High density arrays comprising arbitrary (haphazard) probe oligonucleotides 
for generic difference screening were produced by shuffling (randomizing) the masks used 
in light-directed polymer synthesis. The resulting arrays contained more than 34,000 pairs 
25 mer arbitrary probe oligonucleotides. The oligonucleotides in each pair differed by a 

single nucleotide at position 13. 

After hybridization, washing, staining, and scanning as described above, 
data files (containing information regarding probe identity and hybridization intensity) 
were created. 

Differences in intensity between the two oligonucleotides comprising each 
probe pair K (where K ranges from 1 to 34,320) were calculated. More specifically, the 
intensity differences between the oligonucleotides of pair K for replicate j of sample i was 
calculated as: J, 
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where X is the hybridization intensity, i indicates which sample (in this case sample 1 or 
2), and j indicates which replicate (in this case replicate 1 or two for each sample), and K is 
the probe pair (in this case 1 . . . 34,320), and 1 indicates one member of the probe pair, 
while 2 indicates the other member of the probe pair. 

Figures 16a and 16b and 16c illustrate the differences between replicate 1 
and 2 of sample 1 (Fig. 16a, the normal cell line) and between replicate 1 and replicate 2 of 
sample 2 (Fig. 16b, the tumor cell line) for each probe. Thus, Fig. 16a plots the value of 
(X ukl -X, lk2 )-(X 12kl -X I2k2 ) for k-1 to 34,320 on the vertical axis and K on the horizontal 
axis. The two replicates were normalized based on the average ratio of (X llkl -X llk2 )/(X 12kl - 
X 12k2 ) for all probe pairs (i.e., after normalization, the average ratio should approximate 1). 
Similarly, Fig. 15b plots the value of (X 21kI -X 21k2 )-(X 22kl -X 22k2 ) after normalization between 
the two replicates based on the average ratio of (X 21kr X 21k2 )/(X 22k ,-X 22k2 ). Figure 16c plots 
the differences between sample 1 and 2 averaged over the two replicates. This value is 
calculates as ((X 2Ikl +X 22k2 )/2)-((X ukl +X 12k2 )/2) after normalization between the two 
samples based on the average ratio of [(X 21kl + X 22k2 )/2]/[(X 1IkI +X 12k2 )/2]. 

Figures 17a, 17b, and 17c show the data filtered. Figure 16a shows the 
relative change in hybridization intensities of replicate 1 and 2 of sample 1 for the 
difference of each oligonucleotide pair. After normalization between replicates (see 
above), the ratio is calculated as follows: If the absolute value of (X llkI -X llk2 )/(X I2kl -X 12k2 ) 
> 1, then the ratio=(X Mkl -X llk2 )/(X 12kl -X 12k2 ) else the ratio= (X 12kl -X 12k2 )/(X llkl -X lllc2 ) (the 
inverse). The ratio of replicate 1 and 2 of sample 2 for the difference of each 
oligonucleotide pair, normalized, filtered, and plotted the same way as in Figure 17a is 
shown in Fig. 17b. The ratio is calculated as in Fig. 17a, but based on the absolute value of 
(X 21kl -X 21k2 )/(X 22kl -X 22k2 ) and (X 22kl -X 22k2 ) /(X 21kl -X 21k2 ). Fig. 1 7c shows the ratio of 
25 sample 1 and sample 2 averaged over two replicates for the difference of each 

oligonucleotide pair. The ratio is calculated as in Fig. 17a, but based on the absolute value 
of [(X 21kI +X 22k2 )/2]/[(X Ilkl +X 12k2 )/2] and [(X llkI +X 12k2 )/2]/[(X 21kl +X 22k2 )/2] after 

normalization as in Fig. 16c. 

The oligonucleotide pairs that show the greatest differential hybridization 
30 between the two samples can be identified by sorting the observed hybridization ratio and 
difference values. The oligonucleotides that show the largest change (increase or decrease) 
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can be readily seen from the rauo plot of samples 1 and 2 (Fig. 17c). These differences do 
not appear to be in the background noise. Based on the identified oligonucleot.de pan- 
sequences, a gene or EST with the suspected sequence tag can be searched for ,n the 
sequence databases, such as GENBANK, to determine whether the gene has been cloned 
5 and characterized. If the search is negative, appropriate primers can be made to obtain the 
cDNA or part of the cDNA directly from mRNA, cDNA, or from a cDNA library. 

From Figures 16a and 16b, it is observed that several oligonucleotide pairs 
show large differences between two replicates for the same sample. It is believed that this 
results from differential expression in a given tissue. These oligonucleotide pairs detect 
10 genes that are likely highly expressed, so the deviation of replicates for these pairs are 
, larger than those oligonucleotide pairs that bind to nucleotides expressed at low levels (i.e.. 

the standard deviation of the mean is proportional to the mean). That is also why the 
relative change between two samples is a better indicator to detect the differential 
expression between two samples (see Fig. 17c). In order to determine which 
oligonucleotide pairs are of greatest interest, the absolute and relative difference measures 
could be combined into a scoring function. 

Increasing the number of related oligonucleotide pairs (increased 
redundancy) and employment of two-color hybridization/detection schemes is expected to 
help reduce the background variation. This allows more sensitive detection of small 
20 differences and decreases the noise and occurrence of false positives. The 25 mer array 
used in this example is a small subset of all possible 25 mers, thus, increasing the total 
number of oligonucleotide pairs will greatly increase the ability to detect changes in genes 
of unknown sequences by allowing more complete coverage of the available sequence 
space. 

25 

Example 8 
Nucleic Acid End Labeling 

Several RNA transcripts as well as a full mRNA sample from mouse cells 
were fragmented by heat in the presence of Mg 2+ . A ribaA. (deoxyribonucleic acid 6 mer 
30 poly A) labeled with either fluorescein or biotin at the 5' end was then ligated to the 
fragmented RNA using RNA ligase under standard conditions. 
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The labeling appeared to be efficient and the hybridization pattern obtained 
using the labeled RNA as a probe was similar to one obtained using RNA that was labeled 
during an in vitro transcription step. 

Example 9 
Quantification of Labeling Efficiency 

Quantification of the labeling efficiency is accomplished by spiking 
experiments in which specific full-length unfragmented RNA species are spiked into the 
total mRNA pool at different concentrations prior to the end-labeling procedure. The 
10 relative concentrations of the spiked RNA in the pool can then be measured by 

hybridization to a high density array of target oligonucleotides prepared as described 
above. This permits evaluation of the ability to detect particular RNA species at low 
concentration in the mRNA pool. 
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Example 10 
PCR Labeling of Nucleic Acids 

Polymerase Chain Reaction (PCR) 

20 tx\ PCR reactions substituted with 10% biotin-dUTP were 
conducted and the quantity of each PCR product was estimated with gel analysis. 
Approximately 250 fmoles of each PCR product was pooled. A Pharmacia S300 
sephacryl column (cat # 27-5130-01) was prepared with a 1 minute prespin at 3000 x 
g followed with a 200 (A H 2 0 wash and spin at 3000 x g for 1 more minute. The 
pooled PCR product was loaded and spun for 2 minutes at 3000 x g. 

The column was discarded and the eluate was speed vacuumed to 



25 dryness. 
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DNase Fragmentation 

The dried down PCR pool in was resuspended in 1 3 iA H 2 Ofrom NEN 
DuPont End Labeling Kit (cat # NEL824). 2.5 „\ CoCl 2 and 12.5 fA TdT buffer were 
added. Gibco BRL DNase 1 was diluted to 0.25 U/jd using 10 mM Tris pH 8. 1 >A of 
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diluted DNase was added to PCR product pool and incubated for 6 minutes at 37°C, 
denatured for 10 minutes at 99°C, and cooled to 4°C. The total volume was 29 M l. 

Terminal Transferase (TdT) Labeling 

To the fragmented PCR pool, 2 fxl of TdT enzyme (from NEN kit 2 
U/^1 stock) was added and 4 „1 NEN kit biotin-ddATP was then added. The final 
volume was 35 /A. and was incubated at 37°C for 1 .5 nr. 

Hybridization 

The 35 fj.1 labeled target was split into two 17.5 *il aliquots, one for a 
coding chip (GeneChip containing sense-strand sequences and permutations thereof) 
and one for the non-coding (antisense) chip. 182.5 /.l of 2.5 M TMAC1 (Sigma 5 M 
stock diluted 1 :2 using 10 mM Tris pH 8) was added. Triton X-l 00 was added to a 
final concentration of 0.001%. In certain experiments, 4 ^1 of 100 nM control 
15 oligonucleotide was added to the solution rather than at the stain step. 

The mixture was denatured at 95 °C for 5 minutes, added directly to the 
chip cartridge and hybridized with mixing at 37°C for 60 minutes. 

Staining and Washing 

20 The hybridization solution was removed from the flow cell used in the 

GeneChip system (Affymetrix, Inc., Santa Clara, CA) and the chamber was manually 
rinsed with 3 X with 6X SSPE /0.001% Triton X-100 to remove TMAC1. 

A phycoerythrin stain solution was prepared as follows: 190 (A 6X 
SSPE/0.001% Triton X-100 + 10 iA of 20 mg/ml acetylated BSA + 0.4 /A stock 

25 phycoerythrin (Molecular Probes Cat # S866) + 4 /zl fluorescein control oligo 100 nM 
stock. 

The staining solution was added to the flow cell with mixing at room 
temperature for 5 minutes. The staining solution was removed from the flow cell and 
manually rinsed 3 X with washing buffer. 
30 The chip was washed on hybridization station (the GeneChip system, 

Affymetrix, Inc.) using 6X SSPE/0.001% Triton X-100 at 35°C. 9 fill/drain changes 
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of fresh wash solution were used and scanning took place in this buffer. Target 
sequences were accurately identified in this experiment. 

Example 11 
End Labeling PCR Product 

PCR product was fragmented and end labeled using TdT from 
Boehringer Mannheim: After the PCR amplification, 5 ul of a 50 ul PCR reaction was 
run on a 1% agarose gel to estimate total yield of the amplification reactton. To 
fragment the DNA, the remaining 45 ul of solution was combined wnh DNAse I 
10 (diluted in H2Q to a final concentration of 5 U DNAse I/ug DNA) and reacted for 15 
minutes at 31 "C. The DNAse was then heat killed for 10 minutes at 95°C. The 
fragmented DNA solution was then held at 4°C until ready for the terminal transferase 
reaction. 

The terminal transferee reaction mixture consisted of the fragmented 
,.15 PCR sample, 20 uL 5X terminal transferase reaction buffer, 6ul 25 mM CoCl 2 

(final concentration 1.5 mM), 1 ul of fluorescent dideoxynucleotide triphosphates 
(ddNTP final concentration 10 uM) and 2 uL of Boehringer Mannheim termmal 
transferase (TdT, final concentration 50 U/reaction), and H z O up to 100 ul volume. 

The reaction was incubated for 30 minutes at 37°C. THe whole 
reaction volume was then transferred to a 1 .7 ml tube, brought up to 500 ul with 5X 
SSPE, 0.05% Triton hyb and scanned normally. 

Protocols for the 50 uL PCR reaction are found in the instructional 
materials accompanying the GeneChip™ HIV PRT Assay (Affymetrix, Sunnyvale, 
CA). 

25 

Example 12 
CAIP Improves Base Calling 

In certain fragment end labeling experiments, the accuracy of base 
calling in a GeneChip system was improved when calf intestinal alkaline phosphatsae 
30 (CAIP) was used during fragmentation with DNAse. See Figure 1 8. 
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CIAP is usefull in degrading any nucleotides that were not incorporated 
in any previous amplification, transcription, and polymerase other polymerase 
reactions. Such degredation prevents the incorporation of those nucleotides m 
subsequent reactions, such as tailing and labeling reactions for example. 

Example 13. 
Post-Hybridization End Labeling 

Post-hybridization end labeling experiments were performed. After 
hybridizationofatargettoaprobe array in the GeneChip system, the targets were 
labeled using terminal transferase (shown as TdTase) as shown in Figure 19. 

Post-hybridization labeling was shown to yield better results when the 
probe array (Chip) was pre-reacted as shown in Figures 20 and 21. 

Figure 21 also shows the results of a DNAse titrations experiment. 
The various titration experiments are shown below in Table 7. 
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These results show that base caUing accuracy can impacted by «he tog* of the urge, 
fragments. Such results further demonstrate the utiHty of the methods disdsoed 

herein. . 
Other experiments have shown that 1U of DNAse is particularly useful m 

obtaining ideal fragment lengths. 

Example 14 
End-Labeling (Tailing) with Poly T 

The nucleic acids tailed with poly-A or poly-A analogs (labeled or unlabeled) 
using methods similar to those set forth in Example 1 3 can be labeled using labeled 
poly-T, as shown in Figure 22. 

Example 15. 
Synthesis of Fluorescent Triphosphate Labels 

To 0.5 umoles (50 uL of a 10 mM solution) of the amino-derivatized 
nucleotide triphosphate, 3-amino-3'deoxythymidinetriphosphate (1) or 2'-amino-2'- 
deoxyuridine triphosphate (2), in a 0.5 ml ependorf tube was added 25 uL of 1 1 M 
aqueous solution of sodium borate, P H 7, 87uL of methanol, and 88 uL (10 umol, 20 
wq uiv) of a 100 mM solution of 5-carboxyfluorescein-X-NHS ester in methanol. The 
mixture was vortexed briefly and allowed to stand at room temperature in the dark for 
15 hours. The sample was then purified by ion-exchange HPLC to afford the 
fluorescein^ derivatives Formula 3 or Formula 4, below, in about 78-84-/0 yield. 




Experiments suggest that these molecules are not substrates for 
terminal transferase (TdT). It is believed, however, that these molecules would be 
sutstrates for a polymerase, such as klenow fragment. 

Example 10 
Synthesis of as-Triazine-3, 5 [2H,4H]-diones 

The analogs as-triazine-3,5[2H,4H]-dione ("6-aza-pyrimidine") 
nucleotides {see, Fig. 23a) are synthesized by methods similar to those used by Petrie, 
etal., Bioconj. Chem. 2: 441 (1991). 

Other useful labeling reagents are sythesized including 5-bromo- 
U/dUTO or ddUTP. See for example Lopez-Canovas, L. Et al., Arch. Med Res 25: 
189-192 (1994); Li, X., et al., Cytometry 20: 172-180 (1995); Boultwood, J. Et al., J. 
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F a,Hol. 148:6. ff .(,986);Trainca I d,c.al.M»n. /^W.1340: 399-405 (1983); and 

Figures 23a, and 23b set forth herein. 

Details of the synthesis of nucleoside analogs corresponding to all of 
the above structures (in particular those of Fig. 23b) have been described in the 
literature Known procedcures can be applied in order to attach a linker to the base. 
The linker modified nucleosides can then be converted to a triphosphate amme for 
final attachment of the dye or hapten which can be carried out using commercxally 

available activated derivatives. 

Other suitable labels include non-ribose or non-2'-deoxyribose- 
containing structures some of which are illustrated in Fig. 23c and sugar-modified 
nucleotide analogues such as are illustrated in Fig. 23d. 

Using the guidance provided herein, the methods for the synthesis of 
reagents and methods (enzymatic or otherwise) of label incorporation useful in 
practicing the invention will be apparent to those skilled in the art. See, for example, 
Chemistry of Nucleosides and Nucleotides 3. Townsend, L.B. ed., Plenum Press, New 
York, at chpt. 4, Gordon, S. The Synthesis and Chemistry of Imidazole and 
Benzamidizole Nucleosides and Nucleotides (1994); Gen Chem. Chemists sf 

^ NusM&s 2, Townsend, L.B. ed., Plenum Press, New York (1994); 
can be made by methods similar to those set forth in Chsmto pf Nj^ko^ks and 
Uusleetidss 1, Townsend, L.B. ed., Plenum Press, New York, at chpt. 4, Gordon, S. 
"The Synthesis and Chemistry of Imidazole and Benzamidizole Nucleosides and 
Nucleotides (1994); Lopez-Canovas, L. Et al., Arch. Med. Res 25: 189-192 (1994); Li, 
X., et al., Cytometry 20: 172-180 (1995); Boultwood, J. Et al., J. Pathol. 148: 61 ff. 
(1986); Traincard, et al., Ann. Immunol. 1340: 399-405 (1983). 

Example 11 

Biotin-chem Link (Boehringer-Mannheim) 

The labeling density is suppose to be 1 biotin per 10 bases. 
Coordinative, non-covalent binding of Biotin-chem-Link to N7 of adenosine and 
guanosine involves heating 1 ug RNA or DNA + 1 ul BCL in 20 ul vol. 85°C for 30 
minutes 
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RNA labeling experiment (4 sets of 4 pooled RNA transcripts) 

Very poor labeling and/or hybridization (cant see 5 pM at all, 20 pM is 
very weak). Samples may have been lost after labeling when microcon-lOOs was used 
to remove unincorporated label. RNA was fragmented after labeling. It is believed 
that this should not be a probem (BM tech help). 



BCL labeling of dsDNA 

Low signal, background across the entire chip. No discrimination. 



Fast-Tag (Vector Labs) (RNA) 

Should get 1 biotin per 10-20 bases. Five reactions were run: 

a) RNA 1 +RN A2+RN A3 (5 pmoles each, total of 5.2 ug) + 25 ul Fast Tag reagent 

b) RNA 1 +RN A2+RN A3 (9 pmoles each, total of 9.4 ug) + 25 ul Fast Tag reagent 

c) RNA 1 +RN A2+RN A3 (18 pmoles each, total of 19 ug) + 40 ul Fast Tag reagent 

d) RN A4+RN A5+RN A6 (10 pmoles each, total of 8.7 ug) + 25 ul Fast Tag reagent 

e) RNA7+RNA8+RNA9 (10 pmoles each, total of 1 1.4 ug) + 25 ul Fast Tag reagent 
The heat method was used to link S-S to RNA. The result: 20 x lower hybridization 
signal than same targets labeled by IVT method. 

Example 12 

RNA ligase/bio-a6 end labeling 

This experiment generally involved the following steps: a). RNA was 
fragmented; b) RNA fragments were 5' phosphorylated with polynucleotide 
kinase/ATP; and c) The 5' end of the RNA is ligated to the 3' end of BioA6 using 
RNA ligase. This is illustrated by the following formula: 

5' biotin- AAAAAA-OH 3' + 5' P-RNA-OH 3' = 5'bioAAAAAA-RNA 3' 

Previously this technique was used to label total cellular mRNA which 
was hybridized to unpackaged chips (high density oligonucleotide arrays) (on 2x3 
slides) in a 10 ul volume. Lack of mixing was a significant problenuuid resulted in 
low hybridization intensities. In vitro transcription (IVT) labeled RNA under these 
conditions gave 10 X higher signal than bio-A6/RNA Ligase labeled target. 
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In other experriments, 3 different ratios of bio-A6:RNA were used: 

1) lx bioA6 =0.5 nmoles biotin-A6 per 1 ug RNA); 

2) 2x bioA6; and 

3) 4xBio A6. 

After labeling, the sample was spun through a microcon-EZ and microcon-3 to 
remove enzymes and dilute out buffer components. 

Bio-A6 labeled target hybridized to chips (high density oligonucleotide 
arrays) gave approximately the same hyb. intensity as in vitro transcription (IVT) 
labeled target. 

Staining was for 15 minutes with PE at normal cone. No significantly 
higher signal or background was seen with 4x as much bioA6 per ug RNA. 

For these experiments, BioA6: (5' biotin-AAAAAA RNA ) was 

ordered from Genset. 

Example 13 
Preparation of Gene-Specific Transcripts 

Template DNA preparation 

Linearization of vector: 

If the gene is not already cloned in a vector with T3 and T7 RNA 
polymerase promoter sites flanking the insert, see PCR amplification below. 

The vector is linearized with an enzyme that cuts at the 3' end of the 
insert for sense transcripts, or at the 5' end for antisense transcripts. The insert 
sequence was checked to verify that the RE does not cut internally. In a preferred 
embodiment, aa restriction enzyme was chosen that does not produce 3' protruding 
ends. 

Following linearization, an aliquot of the sample is run on a gel (next 
to uncut vector) to verify complete digestion. 

The sample is optionally treated with Proteinase K (100-200 ug/ml) at 
50 C/20 min - 1 hour to remove enzyme or residual RNases (used in-plasmid miniprep 
protocols). 
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The linearized DN A is purified DNA by phenol/chloroform extraction 
and ethanol precipitation or 3-4 rounds of microcon-100 concentration/redilution (see 
below). 

PCR amplification 

Amilification is only preferred if the desired region of the gene is not 

already in a cloning vector with RNA polymerase promoters. 

Starting with genomic DNA (or cDNA), amplify the ORF of interest 
(or region of the gene represented on the chip) using PCR primers with 5' T3A7 RNA 
polymerase promoter sequences and 3 1 gene-specific sequences. 

The following 5' sequence has worked well (with 19-21 gene-specific 

bases added to the 3' end). 

5'.GAATTGTAATACGACTCACTATAGGGAGG-[+19-21 gene-specific bases]-3' 

The 5" end consists of: 

a) six 5' flanking bases of your choice - not part of the promoter 
sequence, but necessary for maximum IVT efficiency. 

b) 1 7 bases of the core T7 RNA polymerase promoter sequence 

c) 1st 6 bases transcribed (sequence of +1 to +6 can affect 
efficiency) 

The other PCR primer would then contain the T3 RNA polymerase promoter 
sequence at the 5' end. The following sequence has worked well: 



5'-A' 



G ATGC AATT AACCCTC ACT AAAGGG AG A-(+ 19-21 gene-specific bases)-3' 



The 5' end consists of: 

a) six 5' flanking bases (sequence can vary from this example) 

b) 1 7 bases of core T3 RN A Polymerase promoter sequence 

c) +1 to +6 transcribed bases 



140 

Amplify the desired sequence using standard PCR conditions with 1st 
5 cycles at the annealing temp, best suited for the gene specific part of the primers 
alone (typically 55-58°C), followed by 25 cycles with annealing at 7<TC. Check PCR 
products on an agarose gel (3-5 ul of a 100 ul rxn). It is not necessary to quantify at 
this stage. 

Optional Proteinase K treatment: 

Add 1 ul of Proteinase K (20 mg/ml) (Amnion) to the remainder of the 
PCR reaction and incubate 20 min to 1 h at 50-60°C. This is usually not necessary, 
but if the in vitro transcription (IVT) products appear degraded while the control IVT 
product included in the kit (described later) is full length, then this step may be added 
prior to the microcon- 1 00 and IVT. 

Microcon 50/100 purification 

Other purification methods are being tested. Ethanol precipitation can 
be subsisted for micron-50 purification. CAUTION: Microcons may leak. Save all 

flow-through portions. 

Add 380 ul RNase-free water to the PCR product and concentrate 
using a microcon-100 or microcon-50 as suggested in instructions (Amicon). Repeat 
the dilution and concentration 2-3 times. The final concentrated sample should be 5- 
100 ul. 

In vitro transcription labeling with biotin 

For maximum yield use Amnion's T3 (#1338) or T7 (#1334) 
Megascript system (their proprietary buffer allows higher nucleotide concentrations 
without inhibiting the polymerase). (Read Ambion instructions and suggestions in kit 
book!). 

Perform IVT as suggested, but with (1 :3) biotinylated:unlabeled CTP 
and UTP. Do not interchange T3 and T7 lOx nucleotides that come with the 
Megascript kits 

For example, make a NTP mix for 4 IVT-labeling reactions as follows: 
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8 nl Ambion's T7 1 Ox ATP [75 raM] 
8 nl Ambion's T7 lOx GTP [75 raM] 
6 nl Ambion's T7 lOx CTP [75 raM] 
6 nl Ambion's T7 lOx UTP [75 raM] 

15 Hi Bio-1 1-CTP [10 mM] (ENZO #42818) 

15 h1 Bio-16-UTP [10 mM] (ENZO #42814) 
For each IVT-labeling reaction, add (at room temp. - not on ice): 

14.5 ul NTP mix 

2.0 ul 1 Ox T7 transcription buffer (Ambion) 
* 1 .5 ul purified PCR product (not more than 1 Hg) 
2.0 ul lOx T7 enzyme mix (Ambion) 
♦Do NOT add more than 1 ug of DNA to the IVT reaction. Higher concentrations of 
DNA actually inhibit the reaction and result in LOWER yields. Final rNTP 
composition: 

7.5 mM ATP 
7.5 mM GTP 

5.625 mM cold UTP/1.875 mM bio-UTP 

5.625 mM cold CTP/1 .875 mM bio-CTP 
Incubate 4-6 hours at 37°C. Shorter incubation times may be sufficient for some 
transcripts or when maximum yield is not important. 

Optional: DNase 1 treatment 

Add 1 Hi RNase-free DNasel (provided with Ambion kit) to each 

reaction and mix well. Incubate 15-20 min. at 37°C 

Optional - Proteinase K treatment 

This step may help reduce background caused by nonspecific protein 

binding to chip and to Strepavidin-phycoerythrin: 

Add RNase-free water to IVT reactions to a final volume of 99 ul. 
Add 1 ul of Ambion's 20 mg/ml proteinase K. 
Incubate at 50 ° C 20-30 min. 
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Microcon purification 

Several other purification methods have been tested - many did not 
sufficiently remove rNTPs or had low yields. A protocol for Carboxy bead-based 
purification (Archana Nair) looks very promising and will soon be used in place of 

microcon purification. 

Note: Set aside an aliquot of the IVT reaction before further 
purification. Setting aside 1% will enable trouble shooting of this step if necessary. 

1 Add 400 ul DEPC water to sample and concentrate sample with 
microcon 50 or 100 (as suggested by Amicon). SAVE ALL 
FLOW-THROUGH FRACTIONS.. 
2. Repeat dilution/concentration 3-4 times. Final volume can be 
10-100 ul. 

See comments below. 

Check IVT produces) on a gel 

Usually it is sufficient to check -0.01-1% of the reaction on a 
nondenaturing agarose/TBE gel. Samples are heated to 65°C for 15 minutes prior to 
electrophoresis. A single band close to the expected size is usually observed. 
If there is enough space on the gel, run 2 or 3 different dilutions of both the unpurified 
and purified IVT products on a gel (~ 0.01%, 0.1% and 1% of each). Gels can be 
stained with Sybr Green II (FMC) at a 1 :10,000 dilution in lx TBE buffer (more 
sensitive than ethidium bromide). 

If precise determination of transcript size it desired, a denaturing gel 
can be run with biotinylated RNA standards (available from Ambion). 

Quantify transcript yield by A 2a r 

Expect 75-1 50 ug RNA per 1 ug starting DNA template. For 
quantitation of purified transcript, about 1% of the concentrated sample diluted with 
water (or TE) into a final volume of 60-70 ul (for a microcuvette) should give 
absorbance readings within the accurate range (0.1-1 OD). For accurate pipetting 
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volumes (> 1 ul), it is usually necessary to make a serial dilution first (for example, 
make a 1/10 dilution of your RNA sample, then measure 10% of the dilution in 60-70 
ul final vol.). Always be sure to take a blank reading in the same cuvette and using 
the same buffer/water that the RNA sample is diluted into. 

Since accurate quantitation of pure transcript is essential for 
meaningful spiking experiments, extra care should be taken to verify that excess 
nucleotides from the IVT reaction have been sufficiently removed and are not 
contributing to the A 26 o* 

The microcon flow through should be saved and checked for A 260 . If significant 
absorbance is present in the last flow through, the RNA should be subjected to 
additional rounds of dilution and concentration until no significant absorbance is 

detected at 260 nm. 

Since microcon filtration devices occassional leak, it is advisable to 

save all flow- through fractions. If the transcript RNA concentration in the 
retained/collected sample is much lower than predicted, the flow-through fractions 
can be re-concentrated using a fresh cartridge (then diluted and reconcentrated at least 
4 times). 

Example 14 
Labeling Total mRN A from Cells/Tissues 
Starting material: Good quality poly A + RNA from at least 5 x 10M x 
10 6 cells *(0.1ug-5ug poly A+). It is more economical to start with more poly A+ 
RNA (up to 5 ug), but if material is limited, as little as 0.1 ug of poly(A)+ can yield a 
sufficient quantity of labeled RNA target (10 uug after IVT labeling/amplification). 

Double Stranded cDNA Synthesis: 

This protocol is a supplement to instructions provided in Gibco BRLOs 
Superscript Choice System. Before proceeding read the Gibco protocol. Follow Gibco 
BRL's Superscript Choice System for cDNA Synthesis, except use the T7-(T)24 
sequence (below) for priming the reverse transcription-first strand cDNA synthesis 
instead of the oligo(dT) or random primers provided with the kit. 
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5'-GGCCAGTGAATTGTAATACGACTCACTATAGGGAG 



T7-(T) 24 pnmer: 

GCGG-(T) 24 -3' 



First Strand Synthesis 

Use 0.1 ug-5 ug Poly (A)*RNA and adjust amount of H 2 0 and enzyme 

as indicated in the BRL instructions. For example: 

3 ul DEPC-water 

4.5 ul (1 ug/ul) mRNA 

1 Hi (100 pmol/ul) T7-(T)24 primer 

1 . Mix/Spin/Incubate at 70°C for 10 minutes. 

2. Chill on ice. 

3. Add the following components (on ice) to the RN A/primer 

mix: 

4 ul of 5X 1 st strand cDN A buffer 

2 ulO.l MDTT 

1 ul[10mM]dNTPmix 

4. Incubate at 37°C for 2 minutes. 

5. Add 4.5 ul Superscript II reverse transcriptase/mix well. Use ( 1 
ul SSII RT per ug RNA). For <1 ug RNA, use 1 ul RT. 

6. Incubate for 1 hour at 37 °C. 
Final Reaction Composition (20 ul vol.): 

50 mM Tris-HCl, pH 8.3 
75 mM KC1 
3 mM MgCl 2 
lOmMDTT 

500 uM each: dATP, dCTP, dGTP, dTTP 
100 pmol T7-(T) 24 pnmer 
4.5 ug mRNA 

900 U RT (200 U per ug mRNA) 



Second Strand Synthesis 
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1 . Place first strand reactions on ice (quickly spin down). 

2. Add: 

95 ul DEPC-H 2 0 

30 nl 5x Second Strand Buffer 

3 ul [10 mM] dNTP mix 

1 ul [10 U/ul] E.coli DNA Ligase 

4 ul [10 U/ul] E. coli DNA Polymerase I 
I ul [2 U/ul] RNaseH 

Final Composition (150 ul): 

25 mM Tris-HCl, pH 7.5 
100 mM KC1 

5 mM MgCl 2 
10mM(NH 4 ) 2 SO 4 
0.15mMb-NAD + 

250 uM each: dATP, dCTP, dGTP, dTTP 

1.2 mMDTT 

65 U/ml DNA ligase 

250 U/ml DNA Polymerase I 

13 U/ml RNase H 

3. Mix/spin down/ incubate at 16°C for 2 hours. 

4. Add 2 ul [10 U] T4 DNA Polymerase. 

5. Incubate 5 min. at 16°C. 

6. Add 10 ul 0.5 M EDTA/store at -20°C. 



25 CLEANUP 

Phenol/chloroform extraction 

OptionahTo reduce sample loss during extraction, see the PLG 

protocol below 

1 . Add an equal volume (162 ul) of (25:24: 1) 

Phenol:chloroform:isoamyl alcohol (saturatecLwith 10 mM 
Tris-HCl pH 8.0/lmM EDTA - Sigma). 



30 
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2. Vortex/spin 5 minutes @ 1 4000 x g. Transfer aqueous phase to 
afresh 1.5 ml tube. 

PLG-Phenol/Chloroform Extraction 

Phase Lock Gels (PLG)* form an inert sealed barrier between the 
aqueous and organic phases of phenol-chloroform extractions. The solid barrier 
allows more complete recovery of the sample (aqueous phase) and minimizes 
interface contamination of the sample. PLGOs are sold as premeasured aliquots in 1.5 
ml tubes, to which the user directly adds sample and phenol-chloroform. 

1 Pellet the Phase Lock Gel (1 .5 ml tube with PLG I -light.) in a 
microcentrifuge for 20-30 seconds [PLG I-heavy should also 
work, but we haven't specifically tested it for this application]. 

2. Transfer the entire ( 1 62 ul) cDNA sample to the PLG tube. 

3. Add an equal volume (162 ul) of (25:24:1) Phenol: 
chlofroform: isoamyl alchohol (saturated with lOmM Tris-HCL 
ph 8.0/lmMEDTA-Sigma). 

4. Mix by inverting (DO NOT VORTEX). PLG will not become 
part of the suspension. Microcentrifuge at full speed (12,000 
xg or greater) for 2 min. 

5. Transfer the aqueous upper phase to a fresh 1 .5 ml tube. 
PLG I IS available from 5 Prime-3 Prime, Inc., cat. #pl-175850 for 50 or#pl- 
188233 for 200 

Microcon-50 Purification 

Other purification methods are being tested. Ethanol precipitation can 
be subsituted for micron-50 purification. CAUTION: Microcons may leak. Save all 

flow-through portions. 

1 . Add 300 ul of 5 mM Tris pH 7.5 to sample. 

2. Concentrate by spinning through a Microcon-50 column 
(Microcon-50 columns, Amicon part #42416)Tfollowing 
directions supplied by Amicon. 
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3. Repeat dilution/concentration 3-4 times, collect and set aside 
flow through in case of column failure. 
Concentrate to a final volume of 5-10 ul if possible, taking care not to allow the 
cartridge to spin to dryness. Collect upper volume. 

5 

In Vitro Transcription Labeling with Biotin 

For maximum yield use Ambion's T3 (#1338) or T7 (#1334) 
Megascript System (their proprietary buffer allows higher nucleotide concentrations 

without inhibiting the polymerase). 
10 Perform IVT as suggested, but with (1:3) biotinylated:unlabeled CTP 

and UTP. Do not interchange T3 and T7 10X nucleotides that come with the 
Megascript System. Read the Ambion detailed instructions and suggestions before 
proceeding. 

15 NTP Labeling Mix 

To make NTP labeling mix for 4 IVT-labeling reactions combine: 

8 ul Ambion's T7 lOx ATP [75 mM] 

8 ul Ambion's T7 1 Ox GTP [75 mM] 

6 ul Ambion's T7 lOx CTP [75 mM] 

20 6 Ml Ambion's T7 1 Ox UTP [75 mM] 

15 ul Bio-ll-CTP [10 mM] (ENZO #42818) 

1 5 ul Bio-16-UTP [1 0 mM] (ENZO #428 14) 

IVT Reaction 

25 1 . For each reaction, combine the following at room temperature, 

not on ice 

14.5 ul NTP labeling mix 
2.0 ul lOx T7 transcription buffer (Ambion) 
*1.5 ul ds cDNA (0.1-1 ug is optimal: see note below!) 
30 2.0 ul lOx T7 enzyme mix (Ambion) 
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♦Do NOT add more than 1 u g of ds cDNA to the IVT reaction. Higher concentrations 
of DNA actually inhibit the reaction and result in LOWER yields. 



Final rNTP Composition: 

7.5 raM ATP 



2. 



7.5 mM GTP 

5.625 mM cold UTP/1 .875 mM bio-UTP 

5.625 mM cold CTP/1.875 mM bio-CTP 

Incubate 4-6 hours at 37oC. (Shorter incubation times may be 

sufficient for some transcripts or when maximum yield is not 

important). 

Store unused NTP labeling mix at -20 °C. 



CLEANUP 

Optional DNAse 1 Treatment 

1 Add 1 ul RNase-free DNasel (provided with Ambion kit) to 

each reaction and mix well. 
2. Incubate 15-20 min. at 37°C. 



Optional Proteinase K Treatment 

This treatment may help reduce background caused by nonspecific 
protein binding to chip and to Strepavidin-phycoerythrin.. 

1 Add RNase-free water to IVT reactions to a final volume of 99 

Hi- 

2. Add 1 ul of Ambion's 20 mg/ml Proteinase K. 

3. Incubate at 50°C 20-30 minutes. 



Microcon Purification 

Several other purification methods have been tested - many did not 
sufficiently remove rNTPs or had low yields. A protocol for Carboxy bead-based 
purification (Archana Nair) looks very promising and will soon be used in place of 
microcon purification. Set aside an aliquot of the IVT reaction before further 
purification. Setting aside 1% will enable trouble shooting of this step if necessary. 
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1 Add 400 ul DEPC water to sample and concentrate sample with 
microcon 50 or 100 (as suggested by Amicon). SAVE ALL 
FLOW-THROUGH FRACTIONS.. 

2. Repeat dilution/concentration 3-4 times. Final volume can be 
10-100 ul. 

3. Since microcon filtration devices occasionally leak, it is 
advisable to save all flow-through fractions. If transcript RNA 
concentration in the retained/collected sample is much lower 
than predicted, the flow-through fractions can be re- 
concentrated using a fresh column then diluted and 
reconcentrated at least 4 times. 

Notes on Yield 

1 . Starting with 4-5 ug poly (A) + for the ds cDN A synthesis and 
using 20% of the purified ds cDNA sample for the IVT, expect 
-75 - 125 ug labeled RNA per IVT reaction. 

2. Reading ~ 1 % of the concentrated sample diluted with water (or 
TE) into a final volume of 60-70 ul (for a microcuvette) should 
give absorbance data within the accurate range (0.1-1 OD). For 
accurate pipetting volumes (> 1 ul), it is usually necessary to 
make a serial dilution first. For example, make a 1/10 dilution 
of your RNA sample, then measure 10% of the dilution in 60- 
70 ul final volume. Be sure to take blank readings in the same 
cuvette and use the same buffer/water that was used for diluting 
the RNA sample. 

3 . For accurate quantitation of labeled RNA, extra care should be 
taken to verify that excess nucleotides from the IVT reaction 
have been sufficiently removed and are not contributing to the 

The microcon flow-through should be saved and checked for A 260 . If significant 
absorbance is present in the last flow through, the RNA should be subjected to 
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additional rounds of dilution and concentrauon until no significant absorbance is 
detected at 260 nm. 

Check unfragmented samples on gel. 

Electrophorese the labeled RN A before fragmentation to observe the 
size distribution of labeled transcripts. Samples can be heated to 65°C for 15 minutes 
and electrophoresed on agarose^TBE gels to get an approximate idea of the transcnpt 
size range. If there is enough space on the gel, run 2 or 3 different dilutions of both the 
unpurified and purified IVT products on a gel (~ 0.01%. 0.1% and 1% of each). Gels 
can be stained with Sybr Green II (FMC) at a 1:10,000 dilution in Ix TBE buffer 
(more sensitive than ethidium bromide). 

Alternatively, for more accurate estimations of the size distribution of 
the RN A population pre and post fragmentation, electrophorese samples through a 
denaturing gel using biotinylated RNA molecular weight markers (Amb.cn). 

Example 15 
Direct labeling ofDNA with Psoralen-Biotin 

The psoralen-bibtin reagent comes lyophilized and can be bought 
separately or as part of "Rad-Free Universal Oligo Labeling and Hybridization Kit" 
(Schleicher & Schuell). It is actually cheaper (per nmole) when bought with the kit so 
you might as well get the extra kit components and save money. The Rad-Free 
Universal Oligo Labeling and Hybridization kit: catalog # 483122 (contains 20 
nmoles of Psoralen-biotin). The same kit with UV Long wave 365 nm lamp: 
#483124. 

1 . Spin down then resuspend the lyophilized psoralen-biotin 
reagent in either: 

a) 14 ul of DMF if you may label fragmented DNA/RNA 
or oligonucleotides with some of the reagent (it needs to 
be more concentrated) OR ^ 

b) 56 ul of DMF if you will definitely be labeling before 
fragmentation. 
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Labeling has been performed both before and after fragmenting 
with similar results, but it is easier to do before fragmentation 
because it can't be labeled in high salt (>20 mM). 
2. Adjust the RNA/DNA concentration to 0.5 ug/10 ul (200 ul 
for 10 ug of DNA), less than 20 mM salt. pH does not matter 
(pH 2.5-10) so you can just use sterile or DEPCed water to 
resuspend or dilute the RNA/DNA into. Plasmid DNA needs 
to be linearized. 

If RNA/DNA is in high salt, it can be diluted and concentrated 
using the appropriate size of microcon (even microcon 3 works 
for fragmented material but takes -70 min per cycle). 

3. Boil sample 10 minVquick chill on ice (store on ice 5 min-3hrs) 
[important - ds DNA will become cross-linked by reagent if 
strands are not separated before labeling] 

4. In dim light add 1 ul of psoralen-biotin reagent per 20 ul of 
DNA/RNA solution (lul psoralen-biotin that was resuspended 
in 56 ul DMF per ug DNA/RNA). *if Psoralen-biotin was 
resuspended in 14 ul, dilute the amount you will need for 
labeling 1:3 in DMF (1 ul cone, psoralen-biotin + 3 ul DMF) 

5. Transfer solution to into a well of a 96-microwell plate on ice 

(up to 150 ul/well). 

6. Place 365 nm UV lamp directly on top of plate so that light 
source is about 2 cm from the sample. Irradiate samples for 
one hour. 

7. Transfer samples to microcentrifuge tubes and add 2 volumes 
of H 2 0-saturated n-butanol to extract unincorporated psoralen 
biotin. vortex/centrifuge 1 min. 

8. Discard butanol (top layer). Repeat extraction. 

9. fragment as you would normally. Denature as normal before 

hybridization (10 min 99-100°C). 
longer UV irradiation does not improve results. 
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, • • „ nMA/RNA does not seem to improve results, 
♦adding more psoralen-biotin per ug DNA/RNA does no 

Example 16 
Psoralen-Biotin Labeling Experiments 

Labeling RNA by standard protocol 

Pool of 4 diff. fragmented RNA transcripts labeled with psoralen- 



biotin 
Results 



of hybridization to chip (5 P M each ). PB labeled targets showed 
proximately ~5x lower intensities than IVT(bio-U + C) labeled targets 



Labeling before vs. after fragmentation: 

No significant difference in hybridization intensities 

Ratio of psoralen-biotin to RNA 

Labeling with a 4x higher ratio of PB : RNA does not significantly 

affect hybridization intensities on chips. 

Time of labeling reaction/uv lamp intensity 

No significant difference between 1 vs. 3 hr. labeling or 15-20 
mW/cm2 (Affy lamp) vs. 5-7 mW/cm2 (S&S lamp) intensity at 365 nm t 

Psoralen-biotin 

Psoralens: planar, tricyclic compounds 

Psoralen-biotin: psoralen conjugated to biotin via 14-atom linker arm. 
High affinity for nucleic acids 
Intercalates into DNA/RNA 

Becomes covalently attached when irradiated with long wave 
UV light. 

Example 17 
Terminal Transferase End-Labeling Protocol 

Tins protocol is tested and optimized thoroughly with only PRT 440S 

chips.) 



DNAse fragmentation 

This will have enough for 4 labeling rxeactions: 

4 pmol of HIV PCR target (3.17ug ofl.2 kb insert) Xul 



DNAse (BRL) 

Calf Alkaline Phosphatase, lUAil(BRL) 

Dilution CAP Buffer (BRL) 

MgCl 2 

Bring up with H20 to lOOul 
37 °C for 15min. 
95°C for lOmin. 
4°C on hold. 



Xul(lU/ug) 
2.5ul (2.5U/rx) 
2.5ul 

Xul(1.25mM) 



TdTLabeling 

F-N6-ddATP, F-ddATP, F-ddCTP, and F-ddUTP are comparable 
labeled in the reaction. We decided to use F-N6-ddATP. 

Fragment DN A sample 25ul (lpmol) 
5XTdT Buffer (Boehringer) 20ul (IX) 
25mM CoC12 (Boehringer) lOul (2.5mM) 
F-N6-ddATP (ImM) lul (10uM) 

TdT (25U/ul) (Boehringer) lul (25U/rx) 

H 2 0 43ul 
37°Cfor30min. 
95 °C for 5min. 
4°C on hold. 

PRT 440S Hybridization (Rela Station) 

Labeled sample 100ul 

10X SSPE; 0.1% Triton X-100 300ul 

Control (lOOnM) 213 Oligos 5ul 

H 2 0 l95ul 
45°CHybfor30min. 

20°C Wash with 6X SSPE, 0.005% Triton X-100; 4 cycles / 10 drain-fill- 
Scan chip at 530nm, 1 1 .25um pixel size. 
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Example 18 
Alternate Labeling Procedures 

Ligation assay 

RNA can be directly labeled by ligating an A6 RNA ol.gonucleoUde 
with biotin at the 5' end with RNA ligase. Cre, a bacterial gene, was transcribed with 
T7 RNA polymerase to generate an antisense RNA. The RNA was fragmented and 
fcnased with olynucleootide kinase to generate 5' phosphorated ends. The Biotin A6 
RNA was then ligated using T4 RNA ligase. 5 P m of ligated RNA was tested on gene 
expression chips along with the labeled Cre. 

Direct labeling of 3' RNA using Poly A polymerase 

Poly A polymerse has been used to catalyze poly A tail on to the free 3 
hydroxyl terminus of RNA utilizmg ATP as a precursor. Recently, it was reported by 
JoomyeongKime,*/. (1995) Nucl. Aci* Res., 23(12): 2245-2251, that they 
successfully used poly A polymerase to tail 3' RNA with CTP. This method can be 
used to label fragmented RNA with biotin CTP to generate labeled target. 

The advantage of this method is that sense RNA (mRNA) can be 
directly labeled by biotin CTP. Antisense RNA can also be labeled after 
fragmentation. The consumption of CTP can be cut down by l/5th compared to an 
IVT reaction. 

Example 19 
Direct Labeling Protocol 

Reagents for direct labeling mRNA 

1) 100 uMrATP200ul 

198 uL DEPC H 2 0 
2 uL (10 mM) rATP 

2) 100 ug/ml BSA 

NEB Acelylated BSA 

3) 30 mM DTT 

4) 10 U/uL polynucleotide kinase 

Boehringer Mannheim 3' phosphatase free cat # 83829 
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5) 1 nmole/uL BioA6 

Genetics Institute 

6) 5U/UL T4 RNA Ligase + 10 X T4 RNA Ligase Buffer 

Epicentre Technologies, catalogue # LR5025 

7) 5 X RNA Fragmentation Buffer 

200 mM Tris-Acetate, pH 8. 1 
500 mM KOAc 
150mMMgOAc 

Direct Labeling Protocol 
Fragmentation 

Add to a 1.5 ml sterile tube 

8 uL poly (A) + RNA in DEPC-H 2 0 (1 ug) 
2 uL 5 X RNA Fragmentation Buffer 
Heat to 94°C for 35 minutes. 

Kinase Reaction 

Add to the 10 uL fragmented RNA: . 
2.4 uL rATP (100 uM) 
2 uL BSA (100 ug/ml) 
2 uL DTT (30 mM) 
1.6uLDEPC-H20 
2 uL polynucleotide kinase (10 U/uL) 
,„cuba,e at 37 -C for 2.5 hour,. Heat ,0 94°C for 2 minutes (heat KU 

enzyme). 

T4 RNA Ligase Reaction 

Add to the 20 uL kinased RNA: 

0.5 uL BioA6 (1 nmole/uL in DEPC-H 2 0) 
3 uL rATP (19 mM) 
3 U L 10 x T4 RNA Ligase buffer 
0.5uLDEPC-H 2 O 
17°C overnight- 2 days. 94°C for 2 minutes. 

Example 20 

. Computer Algorithms to Perform Basecalling on a Target DNA 
Sample Hybridized or Llgated to Generic DNA Arrays. 
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• r-. n ^hios excent that unlike the custom GeneChips 
with customized resequetic.ngGeneCh.ps except 

which physic* place a single series of tiling probes on the ch.p, w.,h a gen nc 

possible n-mer sequences,, .n gene.*!, » resequence a targe £ £^ 
Lomposed into an n-mer complement word spectrum of t...ng prohes. For each 

u „r,i< rFta 24) To make a basecall at a given position within 

sequence with n-mer words (Fig. 21). io 

te target, the intensity of the tiling probe at that position ,s compared to the 

.•uearest-neighbors" because the single base substitution can occur a, n different 

robe that yields the highest — is die base caUed for tha, pos«,on v.*un the 
pi (Fig 25). The advance of using a generic DNA array vs. die standard custom 

1 J generic arrays makes n base caUs for each base within the target whereas 
the custom resequencing GeneChips make only a single base call. 

The final basecal, of a target base is decided upon by an electronic vote 
of ihe base caUs from the n different electronic tilings a, each targe, position (Fig. 26). 



Emperically using the accuracy 
arrays to fitter out inaccurate electronic tilings 



of the basecalls derived from the n electronic tiling 



A given reference DNA sample is hybridized/ligated to a generic DNA 
m A set of n electronic tilings are general and the corresponding basecalls 
J. A correctness score table is «*— by giving a score of Hf a given tiling 
substitution series makes a correct baseca.1 or a score of 0 if die basecaU is incorrect 
(Fig 27) A confidence level for a given basecall can also be atuched to each scoring 
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A varia,, DNA sample is then hybridized/ligated to a second generic DNA array. 
Agai „ a set of n electronic tiHngs - grated, except .his time an tilmgs a, 

2—**——— — - d ° niythose,iKn8s r F ;;; The 

resul , is to dramatical* toP- the overall percentage of correct basecaHs. 

and a variant* a method o) Acting a mutatum. 

For a given n-mer generic array, the abitity to correctly resequence a 
^e, decreases as the convexity of the target increase, As the targe, complexity 

Leases the cross-* be W een nearest neighbors a. differen, posttions mcreases, 

bases within the targe, The comparison of a sample targe, agams, a re fere„ce 

detection. . . +rt 

One method of comparison between the reference and sample » to 

compare the intensities of the tiling probes themselves. However, before a d,rec, 

^ountforbodtcmptocmpandsampletosamplevariation. lemployeda local 
normalization process to normal the signafc. By «.ocal» normahzation, 1 sunply 
divide the intensity of the tiling probeby the sum of the in— of «s nea**, 

25 neighbors (Fig. 29). 

Ws memod of normalization creates good signal tiaddng between sampfcs and ts 
^sensitive.omepresenceofamu^onindica.dbyuteformat.onofa bubble 

(Fig 30) This "local" normalization tiling probe comparison can be further 
^formed by difference analysis and smoothing ,0 a forma, where the presence of a 
30 mutation is more easily visualized. 

Induced Difference method for detecting mutations. 

Another method for using comparisons between a reference and a 
sample to detect mutations is via mutational "induced differences" between tilings 
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probes and their nearest neighbors. Application of this method to a first order nearest 
neighbor tiling analysis involves comparing "locally normalized" probes m the 
reference target to the corresponding probe in the sample target. Tilings that where 
uninformative in part II, because they miscalled the base, may now be informauve 
because certain probe members within that tiling can be induced (caused to increase or 
decrease in intensity) between the reference and the sample indicating the presence of 
a mutation (Fig 31.) These inductions are summed over all the tilings on both the 
forward and reverse strand for a given target position, and the resultant number ,s a 
measure of whether a mutation is present or not (Fig. 32, Fig. 33). 

Example 21 

Use oflnosine on the 5 ends of the MenPoc synthesized probes to 
increase duplex stability and increase the resultant ligation signal on 

Generic Ligation GeneChips. 

We investigated the use of adding degenerate bases, such as inosine 
(pairs with all other bases), to the end of the MenPoc synthesized probes to increase 
duplex stability. We found that indeed, the addition of 1-6 inosines onto the end of 
the probes did in fact increase the signal intensity in both hybridization and ligation 
reactions on a Generic Ligation GeneChip and allowed us to ligate at higher 
temperatures. 

Inosines (0 -6) are placed at the 5' end of the probe during 
manufacturing, and the effects of these terminal inosines are assayed by ligating a 
DNAasel digested, TdT labeled 788 bp DNA fragment to the chips. The increased 
brightness with 2 -6 inosines indicated an enhancement of duplex stability. With 6 
inosines there is a slight decrease in intensity compared to 2-4 inosines because the 
terminal inosines are probably starting to form quartet-like secondary structures. 

Example 22 

Comparison between the specificity ofT4 ligase and Taq ligase when 
used on a Generic Ligation GeneChip. " 

We investigated whether T4 ligase or Taq ligase was more specific in 
Hgating target to the Generic Ligation GeneChip. In order use Taq ligase, we need to 
perform the ligation reaction at 40 degrees C or higher. Consequently, we used an 8- 
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mer ehip with 6 Inosines a, ,hc end of the MenPoc probes to increase «he thermal 
stability of the dup.exe, This allowed us ,o perform .he Ta, Ugase reaction a, 44 
degrees C and compare mis ,o a T4 Hgation reaction a. 37 degrees C. Our results 
indicate, tha, Taq is much more specific than T4 Hgase, and iiga.es a set of urge. 

ends that T4 ligase is unable to ligate. 

Taq lights up fewer features but with a brighter intensity than T4 does 
indicating the specificity of Taq versus T4. „,-„,•, 
Intensity profiles of the tiling probes and nearest netghbor substitutions 
a, given probe positions within the targe, illustrate that Taq is more specific man T4 
and fta, Taq detect signa. intenshy a, probes that T4 fails to detect signa!. 

U is understood that the examples and embodiments described herein are for 
Ulustiative purposes only and that various modifications or changes in Ugh. .hereof 
wiU be suggested to persons skilled in the art and are to be included within the spmt 
^ purview of mis application and scope of me appended Cairns. AH pub.ica.tons, 
patents, and paten, applications ci,ed herein are hereby incorpora,ed by reference for 
all purposes. 



